Cannot create connection to DynamoDB in Java AWS lambda function - java

I have a lambda function, which I'm trying to use to connect to a DynamoDB table I have. I'm using this code to establish the connection:
...
context.getLogger().log("Before create client..");
AmazonDynamoDB ddb = AmazonDynamoDBClientBuilder.standard()
.withEndpointConfiguration(new AwsClientBuilder.EndpointConfiguration(
"https://dynamodb.ap-southeast-2.amazonaws.com", "ap-southeast-2")).build();
context.getLogger().log("After create client..");
...
The output I have from the function is as follows:
==================== FUNCTION OUTPUT ====================
{"errorMessage":"2017-07-28T01:11:34.092Z aeee6505-7331-11e7-b28b-db98038611cc Task timed out after 5.00 seconds"}
==================== FUNCTION LOG OUTPUT ====================
START RequestId: aeee6505-7331-11e7-b28b-db98038611cc Version: $LATEST
Before create client..END RequestId: aeee6505-7331-11e7-b28b-db98038611cc
REPORT RequestId: aeee6505-7331-11e7-b28b-db98038611cc Duration: 5003.51 ms Billed Duration: 5000 ms Memory Size: 256 MB Max Memory Used: 62 MB
2017-07-28T01:11:34.092Z aeee6505-7331-11e7-b28b-db98038611cc Task timed out after 5.00 seconds
As you can see, it times out when trying to build the connection and never prints the second log statement. Is there a reason it would timeout rather than throwing an exception, e.g. if there's an error with the IAM role or something? The dynamoDB region and lambda region are the same (Sydney - ap-southeast-2), so I'd have thought this would work.
The IAM role the lambda function is using has the following permissions:
AmazonDynamoDBReadOnlyAccess
AmazonS3ReadOnlyAccess
AWSLambdaBasicExecutionRole

Fixed it.. bumped up the memory of the lambda function to 1024MB. Seriously not sure why that was required given memory used was always around 60-70MB :/

It's memory issue only..I changed lambda function to 1024MB it start working fine

Related

SCDF: Error handling when pod failed to start

I'm working on a service where it will call Spring Cloud Dataflow (SCDF) to spin off a new k8s Pod for Spring Batch job.
Map<String, String> properties = Map.of("testApp.cpu", cpu, "testApp.memory", memory);
LOGGER.info("Create task '{}' with definition '{}'", taskName, taskDefinition);
taskOperations.create(taskName, taskDefinition);
LOGGER.info("Launching task '{}' with properties {} and arguments '{}'", taskName, properties, args);
return taskOperations.launch(taskName, properties, args);
Everything works fine. The problem is, whenever we pull a non-existing image (eg: due to some connection issue), the pod failed to start AND we end up with pending tasks (with NO batch jobs created whatever)
For example, we will have tasks in the table task_execution (SCDF table) with empty end time
But no related jobs in batch_job_execution table.
It seems fine at first since no pod is created, we don't consume any resource. But as the number of "pending jobs" reached 20, we have the famous error:
Cannot launch task testApp. The maximum concurrent task executions is at its limit [20]
I'm trying to find a way to detect that the pod spin-off has failed (and hence we should mark the task as error), but to no avail.
Is there a way to detect if the task launch has failed when that task launch a new k8s pod?
UPDATE
Not sure if it is relevant, we are using SCDF 1.7.3.RELEASE
Describe the failed pod:
Name: podname-lp2nyowgmm
Namespace: my-namespace
Priority: 1000
Priority Class Name: test-cluster-default
Node: some-ip.compute.internal/XX.XXX.XXX.XX
Start Time: Thu, 14 Jan 2021 18:47:52 +0700
Labels: role=spring-app
spring-app-id=podname-lp2nyowgmm
spring-deployment-id=podname-lp2nyowgmm
task-name=podname
Annotations: iam.amazonaws.com/role: arn:aws:iam::XXXXXXXXXXXX:role/svc-XXXX-XXX-XX-XXXX-X-XXX-XXX-XXXXXXXXXXXXXXXXXXXX
kubernetes.io/psp: eks.privileged
Status: Pending
IP: XX.XXX.XXX.XXX
IPs:
IP: XX.XXX.XXX.XXX
Containers:
podname-lp2nyowgmm:
Container ID:
Image: image_host:XXX/mysystem/myapp:notExist
Image ID:
Port: <none>
Host Port: <none>
Args:
--spring.datasource.username=postgres
--spring.cloud.task.name=podname
--spring.datasource.url=jdbc:postgresql://...
--spring.datasource.driverClassName=org.postgresql.Driver
--spring.datasource.password=XXXX
--fileId=XXXXXXXXXXX
--spring.application.name=app-name
--fileName=file_name.csv
...
--spring.cloud.task.executionid=3
State: Waiting
Reason: ErrImagePull
Ready: False
Restart Count: 0
Limits:
cpu: 2
memory: 8Gi
Requests:
cpu: 2
memory: 8Gi
Environment:
ELASTIC_SEARCH_PORT: 80
ELASTIC_SEARCH_PROTOCOL: http
SPRING_RABBITMQ_PORT: ${RABBITMQ_SERVICE_PORT}
ELASTIC_SEARCH_URL: elasticsearch
SPRING_PROFILES_ACTIVE: kubernetes
CLIENT_SECRET: ${CLIENT_SECRET}
SPRING_RABBITMQ_HOST: ${RABBITMQ_SERVICE_HOST}
RELEASE_ENV_NAME: QA_TEST
SPRING_CLOUD_APPLICATION_GUID: ${HOSTNAME}
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-xxxxx(ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-xxxxx:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-xxxxx
Optional: false
QoS Class: Guaranteed
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 3m22s default-scheduler Successfully assigned my-namespace/podname-lp2nyowgmm to some-ip.compute.internal
Normal Pulling 103s (x4 over 3m21s) kubelet Pulling image "image_host:XXX/mysystem/myapp:notExist"
Warning Failed 102s (x4 over 3m19s) kubelet Failed to pull image "image_host:XXX/mysystem/myapp:notExist": rpc error: code = Unknown desc = Error response from daemon: manifest for image_host:XXX/mysystem/myapp:notExist not found: manifest unknown: manifest unknown
Warning Failed 102s (x4 over 3m19s) kubelet Error: ErrImagePull
Normal BackOff 88s (x6 over 3m19s) kubelet Back-off pulling image "image_host:XXX/mysystem/myapp:notExist"
Warning Failed 73s (x7 over 3m19s) kubelet Error: ImagePullBackOff
1.7.3 is a very old release. We just released 2.7. The original logic used the task execution tables instead of the pod status. If the version you are using is subject to that, then it would explain what you are seeing. I strongly recommend an upgrade.
Thanks for the question. Looking at the source code, we don't include Pendingpods when calculating the current number of executing tasks. It may be something else is going on. 1) Could you run kubectl describe pod on a pod when it's in this state and post the result? (status details). 2) Is the deployer configured to create a job for each task? (false by default).

Hazelcast custom timeout for operations

I am using "hazelcast.operation.call.timeout.millis = 100" configuration to timeout hazelcast operations.
But at the startup of the hazelcast some of the map size operation are getting timeout because of this configuration. I just only want to timeout the operations after the map load which are basically map get operations. Is there any way to add custom operation timeout for those map.get() operations ?
Is there any other way to get this done ???
com.hazelcast.core.OperationTimeoutException: HDMapSizeOperation got rejected before execution due to not starting within the operation-call-timeout of: 100ms. Current time: 2017-05-15 11:41:47.503. Start time: 2017-05-15 11:41:44.189. Total elapsed time: 3314 ms. Invocation{op=com.hazelcast.map.impl.operation.HDMapSizeOperation{serviceName='hz:impl:mapService', identityHash=1941379381, partitionId=0, replicaIndex=0, callId=-24461, invocationTime=1494828707296 (2017-05-15 11:41:47.296), waitTimeout=-1, callTimeout=100, name=blockMap}, tryCount=250, tryPauseMillis=500, invokeCount=11, callTimeoutMillis=100, firstInvocationTimeMs=1494828704189, firstInvocationTime='2017-05-15 11:41:44.189', lastHeartbeatMillis=0, lastHeartbeatTime='1970-01-01 05:30:00.000', target=[192.168.2.204]:5701, pendingResponse={VOID}, backupsAcksExpected=0, backupsAcksReceived=0, connection=null}
at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.newOperationTimeoutException(InvocationFuture.java:151)
at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolve(InvocationFuture.java:99)
at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolveAndThrowIfException(InvocationFuture.java:75)
at com.hazelcast.spi.impl.AbstractInvocationFuture.get(AbstractInvocationFuture.java:155)
at com.hazelcast.spi.impl.operationservice.impl.InvokeOnPartitions.retryFailedPartitions(InvokeOnPartitions.java:143)
at com.hazelcast.spi.impl.operationservice.impl.InvokeOnPartitions.invoke(InvokeOnPartitions.java:73)
at com.hazelcast.spi.impl.operationservice.impl.OperationServiceImpl.invokeOnAllPartitions(OperationServiceImpl.java:371)
at com.hazelcast.map.impl.proxy.MapProxySupport.size(MapProxySupport.java:628)
at com.hazelcast.map.impl.proxy.MapProxyImpl.size(MapProxyImpl.java:102)
at it.XXXX.tbx.server.MapLoader.run(MapLoader.java:36)
Regards,
Tharinda
If you are trying to control waiting on the result of e.g. a map.get; you could have a look at the asynchronous version like map.getAsync. It returns a future and you can control how long you want to wait for a result.
Modifying the call timeout is not advised.

Apache-Flink Quickstart - reading CSV file error : Futures timed out after [10000 milliseconds]

I want to read CSV file using Flink-API locally, by the following code:
csvPath="data/weather.csv";
List<Tuple2<String, Double>> csv= env.readCsvFile(csvPath)
.types(String.class,Double.class).collect();
I tried some files in different size(from 800mb to 6gb). Sometimes the operation is completed successfully and sometimes it is not, because of the following timeout exception:
Exception in thread "main" java.util.concurrent.TimeoutException: Futures timed out after [10000 milliseconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:153)
at scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:169)
at scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:169)
at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.ready(package.scala:169)
at org.apache.flink.runtime.minicluster.FlinkMiniCluster.shutdown(FlinkMiniCluster.scala:439)
at org.apache.flink.runtime.minicluster.FlinkMiniCluster.stop(FlinkMiniCluster.scala:408)
at org.apache.flink.client.LocalExecutor.stop(LocalExecutor.java:127)
at org.apache.flink.client.LocalExecutor.executePlan(LocalExecutor.java:195)
at org.apache.flink.api.java.LocalEnvironment.execute(LocalEnvironment.java:91)
at org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:923)
at org.apache.flink.api.java.DataSet.collect(DataSet.java:410)
at org.apache.flink.simpleCSV.run(simpleCSV.java:83)
how can I fix this problem? increase this timeout programmatically? Or should I put a config file somewhere? Is there a specific heap size that I should set based on the file size?
collect() transfers the data from the cluster to the local client. This does only work for very small data sets (< 10 MB).
If you have larger data sets, you need to process them on the cluster and emit the results through an output format, e.g., write it to a file.
If you are debugging this program, you can set a break point at the constructor of org.apache.flink.api.java.LocalEnvironment (the constructor with config) and run the following command to change the timeout to 200 seconds (Alt+F8 in IntelliJ Idea):
config.setString("akka.ask.timeout", "200 s")
To find LocalEnvironment class in IntelliJ Idea, press Ctr+n, and check "Include non-project classes in the pop-up window, then type "LocalEnvironment" in the edit box.

All host(s) tried for query failed - com.datastax.driver.core.exceptions.OperationTimedOutException- Operation timed out)

We have set up a 3 node cassandra cluster on Digital Ocean and have written some java program which uses the Java CQL driver to connect to the cassandra. The queries continue to run for a while, but after some time we are getting the following exception
Exception in thread "main" com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /128.199.98.201:9042 (com.datastax.driver.core.exceptions.OperationTimedOutException: [/128.199.98.201] Operation timed out))
at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:231)
at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:77)
at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1402)
at com.datastax.driver.core.Cluster.init(Cluster.java:164)
at com.datastax.driver.core.Cluster.connectAsync(Cluster.java:343)
at com.datastax.driver.core.Cluster.connectAsync(Cluster.java:316)
at com.datastax.driver.core.Cluster.connect(Cluster.java:254)
at com.attinad.cantiz.iot.platform.cassandrasample.PagingExample.connect(PagingExample.java:24)
at com.attinad.cantiz.iot.platform.cassandrasample.App.main(App.java:31)
The various time out values in the cassandra.yaml is given below
range_request_timeout_in_ms: 10000
write_request_timeout_in_ms: 2000
counter_write_request_timeout_in_ms: 5000
cas_contention_timeout_in_ms: 1000
truncate_request_timeout_in_ms: 60000
request_timeout_in_ms: 10000
Any idea whether the issue is time out related or is a coding problem... Any help is appreciated

JMeter test on Netty-based impl produces error for every second request

I've implemented an HTTP service based on the HTTP server example as provided by the netty.io project.
When I execute a GET request to the service URL from command-line (wget) or from a browser, I receive a result as expected.
When I perform a load test using ApacheBench ab -n 100000 -c 8 http://localhost:9000/path/to/service, experience no errors (neither on service nor on ab side) and see fair numbers for request processing duration.
Afterwards, I set up a test plan in JMeter having a thread group with 1 thread and a loop count of 2. I inserted an HTTP request sampler where I simply added the server name localhost, the port number 9000 and the path /path/to/service. Then I also added a View Results Tree and a Summary Report listener.
Finally, I executed the test plan and received one valid response and one error showing the following content:
Thread Name: Thread Group 1-1
Sample Start: 2015-06-04 09:23:12 CEST
Load time: 0
Connect Time: 0
Latency: 0
Size in bytes: 2068
Headers size in bytes: 0
Body size in bytes: 2068
Sample Count: 1
Error Count: 1
Response code: Non HTTP response code: org.apache.http.NoHttpResponseException
Response message: Non HTTP response message: The target server failed to respond
Response headers:
HTTPSampleResult fields:
ContentType:
DataEncoding: null
The associated exception found in response data tab showed the following content
org.apache.http.NoHttpResponseException: The target server failed to respond
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:95)
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:61)
at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:254)
at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:289)
at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:252)
at org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:191)
at org.apache.jmeter.protocol.http.sampler.MeasuringConnectionManager$MeasuredConnection.receiveResponseHeader(MeasuringConnectionManager.java:201)
at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:300)
at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:127)
at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:715)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:520)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at org.apache.jmeter.protocol.http.sampler.HTTPHC4Impl.executeRequest(HTTPHC4Impl.java:517)
at org.apache.jmeter.protocol.http.sampler.HTTPHC4Impl.sample(HTTPHC4Impl.java:331)
at org.apache.jmeter.protocol.http.sampler.HTTPSamplerProxy.sample(HTTPSamplerProxy.java:74)
at org.apache.jmeter.protocol.http.sampler.HTTPSamplerBase.sample(HTTPSamplerBase.java:1146)
at org.apache.jmeter.protocol.http.sampler.HTTPSamplerBase.sample(HTTPSamplerBase.java:1135)
at org.apache.jmeter.threads.JMeterThread.process_sampler(JMeterThread.java:434)
at org.apache.jmeter.threads.JMeterThread.run(JMeterThread.java:261)
at java.lang.Thread.run(Thread.java:745)
As I have a similar service already running which receives and processes web tracking data which shows no errors, it might be a problem within my test plan or JMeter .. but I am not sure :-(
Did anyone experience similar behavior? Thanks in advance ;-)
Issue can be related to Keep-Alive management.
Read those:
https://bz.apache.org/bugzilla/show_bug.cgi?id=57921
https://wiki.apache.org/jmeter/JMeterSocketClosed
So your solution is one of those:
If you're sure it's a keep alive issue:
Try jmeter nightly build http://jmeter.apache.org/nightly.html:
Download the _bin and _lib files
Unpack the archives into the same directory structure
The other archives are not needed to run JMeter.
And adapt the value of httpclient4.idletimeout
A workaround is to increase retry or add connection stale check as per :
https://wiki.apache.org/jmeter/JMeterSocketClosed

Categories