Mongodb Atlas "Got Socket exception on Connection To Cluster" - java

I'm using Java & Springboot and MongoDB Atlas and created a database which response to many Object's CURD
When I do the post on uploadingImage, I got this error Got Socket exception on Connection [connectionId{localValue:4, serverValue:114406}] to cluster0-shard-00-02.1c6kg.mongodb.net:27017
However when I call other object's CRUD, it works totally fine. I don't why it raise this exception. BTW all my CRUD operation of all objects works well on localhost when not connecting to MongoDB Atlas, That means my ImageDAO should be fine, I just used mongoTemplate.insert(Image).
I search online, and they said might be IP whitelist of Atlas, So I setup my Cluster open to any IP Address.
Also I set my timeout and socket configure like this in my .properties file:
spring.data.mongodb.uri=mongodb+srv://username:password#cluster0.1c6kg.mongodb.net/database?retryWrites=true&w=majority&keepAlive=true&pooSize=30&autoReconnect=true&socketTimeoutMS=361000000&connectTimeoutMS=3600000
it still not work, I think the problem definitely related to the timeout of socket, But I don't know where else I can config

The Error log is here:
2020-11-01 12:25:34.275 WARN 20242 --- [nio-8088-exec-1] org.mongodb.driver.connection : Got socket exception on connection [connectionId{localValue:4, serverValue:114406}] to cluster0-shard-00-02.1c6kg.mongodb.net:27017. All connections to cluster0-shard-00-02.1c6kg.mongodb.net:27017 will be closed.
2020-11-01 12:25:34.283 INFO 20242 --- [nio-8088-exec-1] org.mongodb.driver.connection : Closed connection [connectionId{localValue:4, serverValue:114406}] to cluster0-shard-00-02.1c6kg.mongodb.net:27017 because there was a socket exception raised by this connection.
2020-11-01 12:25:34.295 INFO 20242 --- [nio-8088-exec-1] org.mongodb.driver.cluster : No server chosen by WritableServerSelector from cluster description ClusterDescription{type=REPLICA_SET, connectionMode=MULTIPLE, serverDescriptions=[ServerDescription{address=cluster0-shard-00-00.1c6kg.mongodb.net:27017, type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, minWireVersion=0, maxWireVersion=8, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=46076648, setName='atlas-d9ovwb-shard-0', canonicalAddress=cluster0-shard-00-00.1c6kg.mongodb.net:27017, hosts=[cluster0-shard-00-00.1c6kg.mongodb.net:27017, cluster0-shard-00-01.1c6kg.mongodb.net:27017, cluster0-shard-00-02.1c6kg.mongodb.net:27017], passives=[], arbiters=[], primary='cluster0-shard-00-02.1c6kg.mongodb.net:27017', tagSet=TagSet{[Tag{name='nodeType', value='ELECTABLE'}, Tag{name='provider', value='AWS'}, Tag{name='region', value='US_WEST_2'}, Tag{name='workloadType', value='OPERATIONAL'}]}, electionId=null, setVersion=1, lastWriteDate=Sun Nov 01 12:25:29 PST 2020, lastUpdateTimeNanos=104428017411386}, ServerDescription{address=cluster0-shard-00-02.1c6kg.mongodb.net:27017, type=UNKNOWN, state=CONNECTING}, ServerDescription{address=cluster0-shard-00-01.1c6kg.mongodb.net:27017, type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, minWireVersion=0, maxWireVersion=8, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=41202444, setName='atlas-d9ovwb-shard-0', canonicalAddress=cluster0-shard-00-01.1c6kg.mongodb.net:27017, hosts=[cluster0-shard-00-00.1c6kg.mongodb.net:27017, cluster0-shard-00-01.1c6kg.mongodb.net:27017, cluster0-shard-00-02.1c6kg.mongodb.net:27017], passives=[], arbiters=[], primary='cluster0-shard-00-02.1c6kg.mongodb.net:27017', tagSet=TagSet{[Tag{name='nodeType', value='ELECTABLE'}, Tag{name='provider', value='AWS'}, Tag{name='region', value='US_WEST_2'}, Tag{name='workloadType', value='OPERATIONAL'}]}, electionId=null, setVersion=1, lastWriteDate=Sun Nov 01 12:25:29 PST 2020, lastUpdateTimeNanos=104428010234368}]}. Waiting for 30000 ms before timing out
2020-11-01 12:25:34.316 INFO 20242 --- [ngodb.net:27017] org.mongodb.driver.cluster : Discovered replica set primary cluster0-shard-00-02.1c6kg.mongodb.net:27017
2020-11-01 12:25:34.612 INFO 20242 --- [nio-8088-exec-1] org.mongodb.driver.connection : Opened connection [connectionId{localValue:5, serverValue:108547}] to cluster0-shard-00-02.1c6kg.mongodb.net:27017
2020-11-01 12:25:34.838 WARN 20242 --- [nio-8088-exec-1] org.mongodb.driver.connection : Got socket exception on connection [connectionId{localValue:5, serverValue:108547}] to cluster0-shard-00-02.1c6kg.mongodb.net:27017. All connections to cluster0-shard-00-02.1c6kg.mongodb.net:27017 will be closed.
2020-11-01 12:25:34.838 INFO 20242 --- [nio-8088-exec-1] org.mongodb.driver.connection : Closed connection [connectionId{localValue:5, serverValue:108547}] to cluster0-shard-00-02.1c6kg.mongodb.net:27017 because there was a socket exception raised by this connection.
2020-11-01 12:25:34.876 INFO 20242 --- [ngodb.net:27017] org.mongodb.driver.cluster : Discovered replica set primary cluster0-shard-00-02.1c6kg.mongodb.net:27017
2020-11-01 12:25:34.878 ERROR 20242 --- [nio-8088-exec-1] o.a.c.c.C.[.[.[/].[dispatcherServlet] : Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is org.springframework.data.mongodb.UncategorizedMongoDbException: Exception sending message; nested exception is com.mongodb.MongoSocketWriteException: Exception sending message] with root cause

Related

Tableau Server (Linux) - tabadmincontroller crashing due to value 'warn'

Recently I was tasked to perform the following actions on our (single node) Tableau Server:
Harden the installation as per: https://help.tableau.com/current/server/en-us/security_harden.htm
Increase the disk size of the attached volume (AWS EBS)
Change the instance type of the server.
Having no previous experience of Tableau, I:
Logged into the EC2 instance and ran:
tsm configuration set -k ssl.protocols -v "all -SSLv2 -SSLv3 -TLSv1 -TLSv1.1"
tsm pending-changes apply
Waited for the changes to apply, and stopped the server
Increased the EBS volume size
Changed the EC2 instance type
Restarted the server.
Upon restart of the server, it took a while for everything to load. And everything appeared to be okay.... except for the admin panel and tsm command.
Running tsm status revealed:
$ tsm status
Could not connect to server. Make sure that Tableau Server is running and try again.
Okay. Checked the syslog (/var/log/syslog), which was pointed me towards tabadmincontroller as per:
systemd[1172]: Started Tableau Server Administration Controller.
tabadmincontroller_0[11354]: [11354] [INFO] 2021-11-24 13:13:47.106 +0000 : Loading configuration from /var/opt/tableau/tableau_server/data/tabsvc/services/tabadmincontroller_0.20204.20.1116.1810/bin/tabadmincontroller.runjavaservice.json
tabadmincontroller_0[11354]: [11354] [INFO] 2021-11-24 13:13:47.107 +0000 : Loading configuration from /var/opt/tableau/tableau_server/data/tabsvc/services/tabadmincontroller_0.20204.20.1116.1810/config/tabadmincontroller.runjavaservice.json
tabadmincontroller_0[11354]: [11354] [INFO] 2021-11-24 13:13:47.107 +0000 : Loading manifest from /var/opt/tableau/tableau_server/data/tabsvc/services
/tabadmincontroller_0.20204.20.1116.1810/bin/tabadmincontroller.jar
tabadmincontroller_0[11354]: [11354] [INFO] 2021-11-24 13:13:47.107 +0000 : Starting malloc_trim thread. Run every 60 sec. Heap pad MB: 1
tabadmincontroller_0[11354]: [11354] [INFO] 2021-11-24 13:13:47.109 +0000 : Loading JVM library /var/opt/tableau/tableau_server/data/tabsvc/services/tabadmincontroller_0.20204.20.1116.1810/repository/jre/lib/server/libjvm.so
tabadmincontroller_0[11354]: [11354] [INFO] 2021-11-24 13:13:47.192 +0000 : Java class name: com.tableausoftware.tabadmin.webapp.TabadminController; Method name: main; Arguments: run
nlp_0[10759]: time="2021-11-24T13:13:54.323621681Z" level=info msg="[core] grpc: Server.Serve failed to create ServerTransport: connection error: desc = \"transport: http2Server.HandleStreams failed to receive the preface from client: EOF\"" logger="pegasus/server:server" system=system
nlp_0[10759]: time="2021-11-24T13:14:05.775776702Z" level=info msg="get job counts" blockedJobCount=0 inProgressJobCount=0 logger="semanticmodel/metrics:Monitor" queuedJobCount=0 waitingJobCount=0
systemd[1172]: tabadmincontroller_0.service: Main process exited, code=exited, status=1/FAILURE
systemd[1172]: tabadmincontroller_0.service: Failed with result 'exit-code'.
systemd[1172]: tabadmincontroller_0.service: Service hold-off time over, scheduling restart.
systemd[1172]: tabadmincontroller_0.service: Scheduled restart job, restart counter is at 7.
systemd[1172]: Stopped Tableau Server Administration Controller.
systemd[1172]: Started Tableau Server Administration Controller.
With an ongoing crash/restart loop going on forever.
Checking the logs at /var/opt/tableau/tableau_server/data/tabsvc/logs/tabadmincontroller
shows the following:
2021-11-24 13:17:37.372 +0000 main : INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=ip-172-33-30-206:8591 sessionTimeout=120000 watcher=org.apache.curator.ConnectionState#3657201c
2021-11-24 13:17:37.381 +0000 main : INFO org.apache.zookeeper.common.X509Util - Setting -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS renegotiation
2021-11-24 13:17:37.387 +0000 main : INFO org.apache.zookeeper.ClientCnxnSocket - jute.maxbuffer value is 4194304 Bytes
2021-11-24 13:17:37.397 +0000 main : INFO org.apache.zookeeper.ClientCnxn - zookeeper.request.timeout value is 0. feature enabled=
2021-11-24 13:17:37.402 +0000 main : INFO org.apache.curator.framework.imps.CuratorFrameworkImpl - Default schema
2021-11-24 13:17:37.408 +0000 main-SendThread(ip-172-33-30-206:8591) : INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server ip-address-redacted. Will not attempt to authenticate using SASL (unknown error)
20redacted:8591) : INFO org.apache.zookeeper.ClientCnxn - Socket connection established, initiating session, client: /ip-redacted:45082, server: ip/ip:8591
2021-11-24 13:17:37.423 +0000 main-SendThread(ip:8591) : INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server, sessionid = 0x10000005ed60050, negotiated timeout = 120000
2021-11-24 13:17:37.432 +0000 main-EventThread : INFO org.apache.curator.framework.state.ConnectionStateManager - State change: CONNECTED
2021-11-24 13:17:37.780 +0000 main : ERROR com.tableausoftware.telemetry.helper.Environment - [xdev-edpa-007] process_type is not present in the configuration
2021-11-24 13:17:38.298 +0000 ZKWorker-ScheduledTask-0 : INFO org.apache.curator.framework.imps.CuratorFrameworkImpl - Starting
2021-11-24 13:17:38.298 +0000 ZKWorker-ScheduledTask-0 : INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=ip sessionTimeout=120000 watcher=org.apache.curator.ConnectionState#48dddf5a
2021-11-24 13:17:38.298 +0000 ZKWorker-ScheduledTask-0 : INFO org.apache.zookeeper.ClientCnxnSocket - jute.maxbuffer value is 4194304 Bytes
2021-11-24 13:17:38.299 +0000 ZKWorker-ScheduledTask-0 : INFO org.apache.zookeeper.ClientCnxn - zookeeper.request.timeout value is 0. feature enabled=
2021-11-24 13:17:38.300 +0000 ZKWorker-ScheduledTask-0 : INFO org.apache.curator.framework.imps.CuratorFrameworkImpl - Default schema
2021-11-24 13:17:38.300 +0000 ZKWorker-ScheduledTask-0-SendThread(ip:8591) : INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server ip/ip:8591. Will not attempt to authenticate using SASL (unknown error)
2021-11-24 13:17:38.301 +0000 ZKWorker-ScheduledTask-0-SendThread(ip:8591) : INFO org.apache.zookeeper.ClientCnxn - Socket connection established, initiating session, client: ip:45088, server: ip/ip:8591
2021-11-24 13:17:38.303 +0000 ZKWorker-ScheduledTask-0-SendThread(ip-172-33-30-206:8591) : INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server ip-172-33-30-206/ip-address:8591, sessionid = 0x10000005ed60051, negotiated timeout = 120000
2021-11-24 13:17:38.303 +0000 ZKWorker-ScheduledTask-0-EventThread : INFO org.apache.curator.framework.state.ConnectionStateManager - State change: CONNECTED
2021-11-24 13:17:39.357 +0000 main : INFO org.eclipse.jetty.server.session - DefaultSessionIdManager workerName=node0
2021-11-24 13:17:39.357 +0000 main : INFO org.eclipse.jetty.server.session - No SessionScavenger set, using defaults
2021-11-24 13:17:39.358 +0000 main : INFO org.eclipse.jetty.server.session - node0 Scavenging every 660000ms
2021-11-24 13:17:39.364 +0000 main : INFO org.eclipse.jetty.server.handler.ContextHandler - Started o.s.b.w.e.j.JettyEmbeddedWebAppContext#38e0ccf3{application,/,[file:///var/opt/tableau/tableau_server/data/tabsvc/temp/tabadmincontroller_0.20204.20.1116.1810/jetty-docbase.7353433006274726961.8850/],AVAILABLE}
2021-11-24 13:17:39.365 +0000 main : INFO org.eclipse.jetty.server.Server - Started #9503ms
2021-11-24 13:17:39.755 +0000 main : INFO org.apache.curator.framework.imps.CuratorFrameworkImpl - Starting
2021-11-24 13:17:39.756 +0000 main : INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=ip-172-33-30-206:8591 sessionTimeout=120000 watcher=org.apache.curator.ConnectionState#6376534c
2021-11-24 13:17:39.756 +0000 main : INFO org.apache.zookeeper.ClientCnxnSocket - jute.maxbuffer value is 4194304 Bytes
2021-11-24 13:17:39.757 +0000 main : INFO org.apache.zookeeper.ClientCnxn - zookeeper.request.timeout value is 0. feature enabled=
2021-11-24 13:17:39.757 +0000 main : INFO org.apache.curator.framework.imps.CuratorFrameworkImpl - Default schema
2021-11-24 13:17:39.758 +0000 main-SendThread(ip-172-33-30-206:8591) : INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server ip-172-33-30-206/ip-address:8591. Will not attempt to authenticate using SASL (unknown error)
2021-11-24 13:17:39.758 +0000 main-SendThread(ip-172-33-30-206:8591) : INFO org.apache.zookeeper.ClientCnxn - Socket connection established, initiating session, client: /ip-address:45098, server: ip-172-33-30-206/ip-address:8591
2021-11-24 13:17:39.760 +0000 main-SendThread(ip-172-33-30-206:8591) : INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server ip-172-33-30-206/ip-address:8591, sessionid = 0x10000005ed60052, negotiated timeout = 120000
2021-11-24 13:17:39.760 +0000 main-EventThread : INFO org.apache.curator.framework.state.ConnectionStateManager - State change: CONNECTED
2021-11-24 13:17:50.803 +0000 main : WARN org.springframework.boot.web.servlet.context.AnnotationConfigServletWebServerApplicationContext - Exception encountered during context initialization - cancelling refresh attempt: org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'webSecurityConfig' defined in URL [jar:file:/opt/tableau/tableau_server/packages/bin.20204.20.1116.1810/tabadmincontroller.jar!/com/tableausoftware/tabadmin/webapp/WebSecurityConfig.class]: Unsatisfied dependency expressed through constructor parameter 1; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'cookieAuthenticationService' defined in URL [jar:file:/opt/tableau/tableau_server/packages/bin.20204.20.1116.1810/tabadmincontroller.jar!/com/tableausoftware/tabadmin/webapp/auth/CookieAuthenticationService.class]: Bean instantiation via constructor failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [com.tableausoftware.tabadmin.webapp.auth.CookieAuthenticationService]: Constructor threw exception; nested exception is org.springframework.core.convert.ConversionFailedException: Failed to convert from type [java.lang.String] to type [int] for value 'warn'; nested exception is java.lang.NumberFormatException: For input string: "warn"
2021-11-24 13:17:50.806 +0000 Curator-Framework-0 : INFO org.apache.curator.framework.imps.CuratorFrameworkImpl - backgroundOperationsLoop exiting
2021-11-24 13:17:50.910 +0000 main : INFO org.apache.zookeeper.ZooKeeper - Session: 0x10000005ed60052 closed
2021-11-24 13:17:50.910 +0000 main-EventThread : INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x10000005ed60052
2021-11-24 13:17:50.917 +0000 Curator-Framework-0 : INFO org.apache.curator.framework.imps.CuratorFrameworkImpl - backgroundOperationsLoop exiting
2021-11-24 13:17:50.930 +0000 Thread-15 : ERROR com.tableausoftware.tabadmin.webapp.EmbeddedLicenseActivator - Caught exception in EmbeddedLicenseActivator.
org.springframework.beans.factory.BeanCreationNotAllowedException: Error creating bean with name 'nativeApiInitializer': Singleton bean creation not allowed while singletons of this factory are in destruction (Do not request a bean from a BeanFactory in a destroy method implementation!)
at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:208) ~[spring-beans-5.2.5.RELEASE.jar:5.2.5.RELEASE]
at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:321) ~[spring-beans-5.2.5.RELEASE.jar:5.2.5.RELEASE]
at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:202) ~[spring-beans-5.2.5.RELEASE.jar:5.2.5.RELEASE]
at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:310) ~[spring-beans-5.2.5.RELEASE.jar:5.2.5.RELEASE]
at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:202) ~[spring-beans-5.2.5.RELEASE.jar:5.2.5.RELEASE]
at org.springframework.beans.factory.config.DependencyDescriptor.resolveCandidate(DependencyDescriptor.java:276) ~[spring-beans-5.2.5.RELEASE.jar:5.2.5.RELEASE]
at org.springframework.beans.factory.support.DefaultListableBeanFactory.doResolveDependency(DefaultListableBeanFactory.java:1290) ~[spring-beans-5.2.5.RELEASE.jar:5.2.5.RELEASE]
at org.springframework.context.annotation.ContextAnnotationAutowireCandidateResolver$1.getTarget(ContextAnnotationAutowireCandidateResolver.java:90) ~[spring-context-5.2.5.RELEASE.jar:5.2.5.RELEASE]
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:192) ~[spring-aop-5.2.5.RELEASE.jar:5.2.5.RELEASE]
at com.sun.proxy.$Proxy144.hasEnoughRoleLicenses(Unknown Source) ~[?:?]
at com.tableausoftware.tabadmin.webapp.impl.ProductKeyService.getLicensingState(ProductKeyService.java:373) ~[tab-tabadmin-controller-latest.jar:?]
at com.tableausoftware.tabadmin.webapp.EmbeddedLicenseActivator$1.run(EmbeddedLicenseActivator.java:43) [tabadmincontroller.jar:?]
2021-11-24 13:17:50.935 +0000 Thread-15 : ERROR com.tableausoftware.tabadmin.webapp.EmbeddedLicenseActivator - Failed to activate the OEM embedded license.
2021-11-24 13:17:51.020 +0000 main-EventThread : INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x10000005ed60050
2021-11-24 13:17:51.020 +0000 main : INFO org.apache.zookeeper.ZooKeeper - Session: 0x10000005ed60050 closed
2021-11-24 13:17:51.021 +0000 main : WARN com.tableausoftware.tabadmin.agent.zookeeper.ConnectionHolder - Object ZooKeeperWorker#74b99b0e. was not a consumer of this connection.
2021-11-24 13:17:51.022 +0000 Curator-Framework-0 : INFO org.apache.curator.framework.imps.CuratorFrameworkImpl - backgroundOperationsLoop exiting
2021-11-24 13:17:51.124 +0000 ZKWorker-ScheduledTask-0-EventThread : INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x10000005ed60051
2021-11-24 13:17:51.124 +0000 main : INFO org.apache.zookeeper.ZooKeeper - Session: 0x10000005ed60051 closed
2021-11-24 13:17:51.142 +0000 main : INFO org.eclipse.jetty.server.session - node0 Stopped scavenging
2021-11-24 13:17:51.144 +0000 main : INFO org.eclipse.jetty.server.handler.ContextHandler - Stopped o.s.b.w.e.j.JettyEmbeddedWebAppContext#38e0ccf3{application,/,[file:///var/opt/tableau/tableau_server/data/tabsvc/temp/tabadmincontroller_0.20204.20.1116.1810/jetty-docbase.7353433006274726961.8850/],UNAVAILABLE}
2021-11-24 13:17:51.154 +0000 main : INFO org.springframework.boot.autoconfigure.logging.ConditionEvaluationReportLoggingListener -
Error starting ApplicationContext. To display the conditions report re-run your application with 'debug' enabled.
2021-11-24 13:17:51.156 +0000 main : ERROR org.springframework.boot.SpringApplication - Application run failed
org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'webSecurityConfig' defined in URL [jar:file:/opt/tableau/tableau_server/packages/bin.20204.20.1116.1810/tabadmincontroller.jar!/com/tableausoftware/tabadmin/webapp/WebSecurityConfig.class]: Unsatisfied dependency expressed through constructor parameter 1; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'cookieAuthenticationService' defined in URL [jar:file:/opt/tableau/tableau_server/packages/bin.20204.20.1116.1810/tabadmincontroller.jar!/com/tableausoftware/tabadmin/webapp/auth/CookieAuthenticationService.class]: Bean instantiation via constructor failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [com.tableausoftware.tabadmin.webapp.auth.CookieAuthenticationService]: Constructor threw exception; nested exception is org.springframework.core.convert.ConversionFailedException: Failed to convert from type [java.lang.String] to type [int] for value 'warn'; nested exception is java.lang.NumberFormatException: For input string: "warn"
Can anyone help point me in the right direction of the next steps here ?

Kafka on the server failed the next day。(EndOfStreamException: Unable to read additional data from client,)

The Kafka service I deployed on the server passed the test a few days ago and was running normally. Later, no other operations were carried out.
Then the service call failed today. I went online and looked at it. It's normal these days. Just this afternoon, the service hung up. Looking at the log, it seems to say that the Kafka service is restarted automatically, and then conflicts with the ID of zookeeper.
Restart the service. Everything is normal. Google doesn't have a good solution. What's the reason?
The following is the relevant log, please help to see, thank you
[2021-11-06 14:43:29,139] WARN Unexpected exception (org.apache.zookeeper.server.NIOServerCnxn)
EndOfStreamException: Unable to read additional data from client, it probably closed the socket: address = /159.89.4.236:59168, session = 0x0
at org.apache.zookeeper.server.NIOServerCnxn.handleFailedRead(NIOServerCnxn.java:163)
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:326)
at org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:522)
at org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:154)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
[2021-11-06 15:00:18,301] INFO Completed load of Log(dir=/tmp/kafka-logs/__consumer_offsets-46, topicId=dPs0Kq4ERHKDaR440Nt8qA, topic=__consumer_offsets, partition=46, highWatermark=0, lastStableOffset=0, logStartOffset=0, logEndOffset=0) with 1 segments in 20ms (53/53 loaded in /tmp/kafka-logs) (kafka.log.LogManager)
[2021-11-06 15:00:18,303] INFO Loaded 53 logs in 2350ms. (kafka.log.LogManager)
[2021-11-06 15:00:18,303] INFO Starting log cleanup with a period of 300000 ms. (kafka.log.LogManager)
[2021-11-06 15:00:18,304] INFO Starting log flusher with a default period of 9223372036854775807 ms. (kafka.log.LogManager)
[2021-11-06 15:00:18,838] INFO [BrokerToControllerChannelManager broker=0 name=forwarding]: Starting (kafka.server.BrokerToControllerRequestThread)
[2021-11-06 15:00:19,148] INFO Updated connection-accept-rate max connection creation rate to 2147483647 (kafka.network.ConnectionQuotas)
[2021-11-06 15:00:19,165] INFO Awaiting socket connections on 0.0.0.0:9092. (kafka.network.Acceptor)
[2021-11-06 15:00:19,237] INFO [SocketServer listenerType=ZK_BROKER, nodeId=0] Created data-plane acceptor and processors for endpoint : ListenerName(PLAINTEXT) (kafka.network.SocketServer)
[2021-11-06 15:00:19,282] INFO [BrokerToControllerChannelManager broker=0 name=alterIsr]: Starting (kafka.server.BrokerToControllerRequestThread)
[2021-11-06 15:00:19,400] INFO [ExpirationReaper-0-DeleteRecords]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2021-11-06 15:00:19,400] INFO [ExpirationReaper-0-Fetch]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2021-11-06 15:00:19,400] INFO [ExpirationReaper-0-Produce]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2021-11-06 15:00:19,409] INFO [ExpirationReaper-0-ElectLeader]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2021-11-06 15:00:19,451] INFO [LogDirFailureHandler]: Starting (kafka.server.ReplicaManager$LogDirFailureHandler)
[2021-11-06 15:00:19,530] INFO Creating /brokers/ids/0 (is it secure? false) (kafka.zk.KafkaZkClient)
[2021-11-06 15:00:19,566] ERROR Error while creating ephemeral at /brokers/ids/0, node already exists and owner '72057654275276801' does not match current session '72057654275276802' (kafka.zk.KafkaZkClient$CheckedEphemeral)
[2021-11-06 15:00:19,572] ERROR [KafkaServer id=0] Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists
at org.apache.zookeeper.KeeperException.create(KeeperException.java:126)
at kafka.zk.KafkaZkClient$CheckedEphemeral.getAfterNodeExists(KafkaZkClient.scala:1904)
at kafka.zk.KafkaZkClient$CheckedEphemeral.create(KafkaZkClient.scala:1842)
at kafka.zk.KafkaZkClient.checkedEphemeralCreate(KafkaZkClient.scala:1809)
at kafka.zk.KafkaZkClient.registerBroker(KafkaZkClient.scala:96)
at kafka.server.KafkaServer.startup(KafkaServer.scala:319)
at kafka.Kafka$.main(Kafka.scala:109)
at kafka.Kafka.main(Kafka.scala)
[2021-11-06 15:00:19,596] INFO [KafkaServer id=0] shutting down (kafka.server.KafkaServer)
[2021-11-06 15:00:19,597] INFO [SocketServer listenerType=ZK_BROKER, nodeId=0] Stopping socket server request processors (kafka.network.SocketServer)```

Resetting heartbeat timestamps because of huge system clock jump

I am running hazelcast application, and I am getting below error, after keeping my machine in sleep mode/log off for sometime.
2021-06-21 14:41:07.854 INFO 8288 --- [cached.thread-7] c.h.i.c.impl.ClusterHeartbeatManager
: [192.168.181.51]:5701 [APP] [4.2] System clock apparently jumped from 2021-06-21 14:10:28.569 to 2021-06-21 14:41:07.832 since last heartbeat (+1834263 ms)
2021-06-21 14:41:07.855 INFO 8288 --- [cached.thread-9] c.h.i.server.tcp.TcpServerConnection : [192.168.181.51]:5701 [APP] [4.2] Connection[id=1, /127.0.0.1:5701->/127.0.0.1:5702, qualifier=null, endpoint=[127.0.0.1]:5702, alive=false, connectionType=JVM, planeIndex=-1] closed. Reason: Client heartbeat is timed out, closing connection to Connection[id=1, /127.0.0.1:5701->/127.0.0.1:5702, qualifier=null, endpoint=[127.0.0.1]:5702, alive=true, connectionType=JVM, planeIndex=-1]. Now: 2021-06-21 14:41:07.833. LastTimePacketReceived: 2021-06-21 14:10:29.314
2021-06-21 14:41:07.915 WARN 8288 --- [cached.thread-7] c.h.i.c.impl.ClusterHeartbeatManager : [192.168.181.51]:5701 [APP] [4.2] Resetting heartbeat timestamps because of huge system clock jump! Clock-Jump: 1834263 ms, Heartbeat-Timeout: 60000 ms
2021-06-21 14:41:08.208 WARN 8288 --- [onMonitorThread] c.h.s.i.o.impl.InvocationMonitor : [192.168.181.51]:5701 [APP] [4.2] MonitorInvocationsTask delayed 1836451 ms
2021-06-21 14:41:08.213 WARN 8288 --- [onMonitorThread] c.h.s.i.o.impl.InvocationMonitor : [192.168.181.51]:5701 [APP] [4.2] BroadcastOperationControlTask delayed 1834623 ms
2021-06-21 14:41:08.539 INFO 8288 --- [cached.thread-9] c.h.i.server.tcp.TcpServerConnection : [192.168.181.51]:5701 [APP] [4.2] Connection[id=2, /127.0.0.1:5701->/127.0.0.1:5703, qualifier=null, endpoint=[127.0.0.1]:5703, alive=false, connectionType=JVM, planeIndex=-1] closed. Reason: Client heartbeat is timed out, closing connection to Connection[id=2, /127.0.0.1:5701->/127.0.0.1:5703, qualifier=null, endpoint=[127.0.0.1]:5703, alive=true, connectionType=JVM, planeIndex=-1]. Now: 2021-06-21 14:41:08.539. LastTimePacketReceived: 2021-06-21 14:10:29.949
2021-06-21 14:41:08.551 WARN 8288 --- [ached.thread-36] c.h.i.cluster.impl.MulticastService : [192.168.181.51]:5701 [APP] [4.2] Sending multicast datagram failed. Exception message saying the operation is not permitted usually means the underlying OS is not able to send packets at a given pace. It can be caused by starting several hazelcast members in parallel when the members send their join message nearly at the same time.
java.net.NoRouteToHostException: No route to host: Datagram send failed
at java.net.TwoStacksPlainDatagramSocketImpl.send(Native Method) ~[na:1.8.0_251]
at java.net.DatagramSocket.send(Unknown Source) ~[na:1.8.0_251]
at com.hazelcast.internal.cluster.impl.MulticastService.send(MulticastService.java:291) ~[hazelcast-all-4.2.jar!/:4.2]
at com.hazelcast.internal.cluster.impl.MulticastJoiner.searchForOtherClusters(MulticastJoiner.java:113) [hazelcast-all-4.2.jar!/:4.2]
at com.hazelcast.internal.cluster.impl.SplitBrainHandler.searchForOtherClusters(SplitBrainHandler.java:75) [hazelcast-all-4.2.jar!/:4.2]
at com.hazelcast.internal.cluster.impl.SplitBrainHandler.run(SplitBrainHandler.java:42) [hazelcast-all-4.2.jar!/:4.2]
at com.hazelcast.spi.impl.executionservice.impl.DelegateAndSkipOnConcurrentExecutionDecorator$DelegateDecorator.run(DelegateAndSkipOnConcurrentExecutionDecorator.java:77) [hazelcast-all-4.2.jar!/:4.2]
at com.hazelcast.internal.util.executor.CachedExecutorServiceDelegate$Worker.run(CachedExecutorServiceDelegate.java:217) [hazelcast-all-4.2.jar!/:4.2]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [na:1.8.0_251]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [na:1.8.0_251]
at java.lang.Thread.run(Unknown Source) [na:1.8.0_251]
at com.hazelcast.internal.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:76) [hazelcast-all-4.2.jar!/:4.2]
at com.hazelcast.internal.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:102) [hazelcast-all-4.2.jar!/:4.2]
My client config is as below:
ClientConfig clientConfig = new ClientConfig();
clientConfig.setClusterName("abc");
clientConfig.getNetworkConfig().addAddress("localhost");
clientConfig.getNetworkConfig().setSmartRouting(true);
clientConfig.getNetworkConfig().addOutboundPortDefinition("5701-5720");
ClientConnectionStrategyConfig connectionStrategyConfig = clientConfig.getConnectionStrategyConfig();
ConnectionRetryConfig connectionRetryConfig = connectionStrategyConfig.getConnectionRetryConfig();
connectionRetryConfig.setInitialBackoffMillis(1000)
.setMaxBackoffMillis(60000)
.setMultiplier(2)
.setClusterConnectTimeoutMillis(1000)
.setJitter(0.2);
HazelcastClient hc = HazelcastClient.newHazelcastClient(clientConfig);
Please let me know what wrong configuration am I doing or why this is happening?
This is a common issue for socket based applications. Ideally you disable the sleep / power save mode. You can try using the SystemParametersInfo API:
SystemParametersInfo( SPI_SETPOWEROFFACTIVE, 0, NULL, 0 );
But typically this would be seen as ill behaved since you should disable the Power Off during installation of your application thus requesting permission.

Spring boot mongo time out while connecting with mongo driver

While I'm working with spring-boot-starter-data-mongodb. I always got a timeout exception. The log detail is as follows:
Could any body tell me why I always got a timeout? thanks so much.
2019-04-01 19:08:50.255 INFO 8336 --- [168.0.101:27017] org.mongodb.driver.cluster : Exception in monitor thread while connecting to server 192.168.0.101:27017
com.mongodb.MongoSocketReadTimeoutException: Timeout while receiving message
at com.mongodb.connection.InternalStreamConnection.translateReadException(InternalStreamConnection.java:530)
at com.mongodb.connection.InternalStreamConnection.receiveMessage(InternalStreamConnection.java:421)
2019-04-01 19:09:15.163 DEBUG 8336 --- [nio-8888-exec-1] o.s.b.w.s.f.OrderedRequestContextFilter : Cleared thread-bound request context: org.apache.catalina.connector.RequestFacade#4ce3ddaf
2019-04-01 19:09:15.165 ERROR 8336 --- [nio-8888-exec-1] o.a.c.c.C.[.[.[/].[dispatcherServlet] : Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is org.springframework.dao.DataAccessResourceFailureException: Timed out after 30000 ms while waiting to connect. Client view of cluster state is {type=UNKNOWN, servers=[{address=192.168.0.101:27017, type=UNKNOWN, state=CONNECTING, exception={com.mongodb.MongoSocketReadTimeoutException: Timeout while receiving message}, caused by {java.net.SocketTimeoutException: Read timed out}}]; nested exception is com.mongodb.MongoTimeoutException: Timed out after 30000 ms while waiting to connect. Client view of cluster state is {type=UNKNOWN, servers=[{address=192.168.0.101:27017, type=UNKNOWN, state=CONNECTING, exception={com.mongodb.MongoSocketReadTimeoutException: Timeout while receiving message}, caused by {java.net.SocketTimeoutException: Read timed out}}]] with root cause
com.mongodb.MongoTimeoutException: Timed out after 30000 ms while waiting to connect. Client view of cluster state is {type=UNKNOWN, servers=[{address=192.168.0.101:27017, type=UNKNOWN, state=CONNECTING, exception={com.mongodb.MongoSocketReadTimeoutException: Timeout while receiving message}, caused by {java.net.SocketTimeoutException: Read timed out}}]
at com.mongodb.connection.BaseCluster.getDescription(BaseCluster.java:167)
at com.mongodb.Mongo.getConnectedClusterDescription(Mongo.java:885)
at com.mongodb.Mongo.createClientSession(Mongo.java:877)
at com.mongodb.Mongo$3.getClientSession(Mongo.java:866)
My application.yml is and spring boot version is 2.0.8.RELEASE, and here is the content:
spring:
data:
mongodb:
host: 192.168.0.101
port: 27017
username: test
password: test
database: test
server:
port: 8888
management:
health:
mongo:
enabled: false
you can try this:
<dependency>
<groupId>com.spring4all</groupId>
<artifactId>mongodb-plus-spring-boot-starter</artifactId>
<version>1.0.0.RELEASE</version>
</dependency>
#EnableMongoPlus
#SpringBootApplication
public class Application {
public static void main(String[] args) {
SpringApplication.run(Application.class, args);
}
}
Than you have a bunch of more configuration properties to play around :)
spring.data.mongodb.option.min-connection-per-host=0
spring.data.mongodb.option.max-connection-per-host=100
spring.data.mongodb.option.threads-allowed-to-block-for-connection-multiplier=5
spring.data.mongodb.option.server-selection-timeout=30000
spring.data.mongodb.option.max-wait-time=120000
spring.data.mongodb.option.max-connection-idle-time=0
spring.data.mongodb.option.max-connection-life-time=0
spring.data.mongodb.option.connect-timeout=10000
spring.data.mongodb.option.socket-timeout=0
spring.data.mongodb.option.socket-keep-alive=false
spring.data.mongodb.option.ssl-enabled=false
spring.data.mongodb.option.ssl-invalid-host-name-allowed=false
spring.data.mongodb.option.always-use-m-beans=false
spring.data.mongodb.option.heartbeat-socket-timeout=20000
spring.data.mongodb.option.heartbeat-connect-timeout=20000
spring.data.mongodb.option.min-heartbeat-frequency=500
spring.data.mongodb.option.heartbeat-frequency=10000
spring.data.mongodb.option.local-threshold=15
I have not tried it yet... but maybe it's worth a try.
Or look in the the repo how to do it without the dependency in your Project ;)
It is not a final solution, but you can try a longer timeout.
# The time to wait to establish a connection before timing out, in seconds.
# (default: 10)
connect_timeout: 99
If it connects successfully after changing the timeout, you should find why it is taking so long to establish a connections and try to fix it.
If even after setting a very long timeout it does not connect you should check your proxy and try to ping the machine where is the mongodb.

SolrJ hanging when connecting to zookeeper

I have a local two instance Solr Cloud setup with a single zookeeper instance. I am trying to connect via SolrJ to execute a query however my code hangs for 2mins or so when executing the query and then fails. I have followed the basic example on the Solr wiki. The logs/code is below
2016-07-24 13:29:01.932 INFO 83666 --- [qtp699221219-28] org.apache.zookeeper.ZooKeeper : Initiating client connection, connectString=localhost:2181 sessionTimeout=10000 watcher=org.apache.solr.common.cloud.SolrZkClient$3#496eab9
2016-07-24 13:29:01.948 INFO 83666 --- [qtp699221219-28] o.a.solr.common.cloud.ConnectionManager : Waiting for client to connect to ZooKeeper
2016-07-24 13:29:01.953 INFO 83666 --- [localhost:2181)] org.apache.zookeeper.ClientCnxn : Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2016-07-24 13:29:01.955 INFO 83666 --- [localhost:2181)] org.apache.zookeeper.ClientCnxn : Socket connection established to localhost/127.0.0.1:2181, initiating session
2016-07-24 13:29:01.967 INFO 83666 --- [localhost:2181)] org.apache.zookeeper.ClientCnxn : Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x1561cdd875e0004, negotiated timeout = 10000
2016-07-24 13:29:01.972 INFO 83666 --- [back-3-thread-1] o.a.solr.common.cloud.ConnectionManager : Watcher org.apache.solr.common.cloud.ConnectionManager#4bb95d56 name:ZooKeeperConnection Watcher:localhost:2181 got event WatchedEvent state:SyncConnected type:None path:null path:null type:None
2016-07-24 13:29:01.972 INFO 83666 --- [qtp699221219-28] o.a.solr.common.cloud.ConnectionManager : Client is connected to ZooKeeper
2016-07-24 13:29:01.973 INFO 83666 --- [qtp699221219-28] o.apache.solr.common.cloud.SolrZkClient : Using default ZkACLProvider
2016-07-24 13:29:01.974 INFO 83666 --- [qtp699221219-28] o.a.solr.common.cloud.ZkStateReader : Updating cluster state from ZooKeeper...
2016-07-24 13:29:01.990 INFO 83666 --- [qtp699221219-28] o.a.solr.common.cloud.ZkStateReader : Loaded empty cluster properties
2016-07-24 13:29:01.995 INFO 83666 --- [qtp699221219-28] o.a.solr.common.cloud.ZkStateReader : Updated live nodes from ZooKeeper... (0) -> (2)
2016-07-24 13:31:24.653 ERROR 83666 --- [qtp699221219-28] o.a.s.client.solrj.impl.CloudSolrClient : Request to collection foo failed due to (0) java.net.ConnectException: Operation timed out, retry? 0
and my code is:
String zkHostString = "localhost:2181";
CloudSolrClient solr = new CloudSolrClient.Builder().withZkHost(zkHostString).build();
solr.setDefaultCollection("foo");
SolrQuery query = new SolrQuery();
query.set("q", "*:*");
QueryResponse response = null;
try {
response = solr.query(query);
} catch (SolrServerException e) {
return null;
}
//Do Something with the results...
Urgh, I'm an idiot, the zookeeper instance and solr instances are inside docker, the code posted above is not. So Zookeeper reported back the solr urls using the docker containers ip...The host needs to connect via localhost and not the docker container ip.
Eg: Zookeeper responds [http://172.17.0.5:8983/solr/foo_shard1_replica2, http://172.17.0.6:8984/solr/foo_shard1_replica1]
but my code needs to call [http://localhost:8983/solr/foo_shard1_replica2, http://localhost:8984/solr/foo_shard1_replica1]

Categories