On an legacy production application we were having an issue where the application crashed because it ran out of connections (the default was 100 connections) as a temporal solution we decided to increase the available connections to 500 but when the application reached 200 connections it just stopped itself, with no errors on the logs, just like a simple shut down.
I added a couple of logs that are generated each 15 secs for clearly seeing the behavior of the connections, these logs prints the idle connection and active connection as well as the full object of the Datasource properties. Before the application shut down the following logs where added:
Datasource idle connections: 0, active connections: 200
Datasource properties: org.apache.tomcat.jdbc.pool.DataSource#20b2475a{ConnectionPool[defaultAutoCommit=null; defaultReadOnly=null; defaultTransactionIsolation=-1; defaultCatalog=null; driverClassName=com.mysql.jdbc.Driver; maxActive=500; maxIdle=500; minIdle=10; initialSize=10; maxWait=30000; testOnBorrow=true; testOnReturn=false; timeBetweenEvictionRunsMillis=5000; numTestsPerEvictionRun=0; minEvictableIdleTimeMillis=60000; testWhileIdle=false; testOnConnect=false; password=********; url=jdbc:mysql://127.0.0.1:3306/db_name?createDatabaseIfNotExist=true; username=username; validationQuery=SELECT 1; validationQueryTimeout=-1; validatorClassName=null; validationInterval=3000; accessToUnderlyingConnectionAllowed=true; removeAbandoned=false; removeAbandonedTimeout=60; logAbandoned=false; connectionProperties=null; initSQL=null; jdbcInterceptors=null; jmxEnabled=true; fairQueue=true; useEquals=true; abandonWhenPercentageFull=0; maxAge=0; useLock=false; dataSource=null; dataSourceJNDI=null; suspectTimeout=0; alternateUsernameAllowed=false; commitOnReturn=false; rollbackOnReturn=false; useDisposableConnectionFacade=true; logValidationErrors=false; propagateInterruptState=false; ignoreExceptionOnPreLoad=false; }
After that the application shut itself down and I found following logs with no errors before:
2021-02-03 20:23:02.618 INFO 1 --- [ Thread-4] ationConfigEmbeddedWebApplicationContext : Closing org.springframework.boot.context.embedded.AnnotationConfigEmbeddedWebApplicationContext#8807e25: startup date [Wed Feb 03 19:49:09 GMT 2021]; root of context hierarchy
2021-02-03 20:23:02.623 INFO 1 --- [ Thread-4] o.s.c.support.DefaultLifecycleProcessor : Stopping beans in phase 0
2021-02-03 20:23:02.643 INFO 1 --- [ Thread-4] o.s.j.e.a.AnnotationMBeanExporter : Unregistering JMX-exposed beans on shutdown
2021-02-03 20:23:02.647 INFO 1 --- [ Thread-4] j.LocalContainerEntityManagerFactoryBean : Closing JPA EntityManagerFactory for persistence unit 'default'
A couple of relevant dependencies and their versions:
org.springframework:spring-webmvc:jar:4.3.6.RELEASE:compile
org.springframework.boot:spring-boot-starter-data-jpa:jar:1.5.1.RELEASE:compile
org.springframework.boot:spring-boot-starter-jdbc:jar:1.5.1.RELEASE:compile
org.apache.tomcat:tomcat-jdbc:jar:8.5.11:compile
org.hibernate:hibernate-core:jar:5.0.11.Final:compile
org.springframework.data:spring-data-jpa:jar:1.11.0.RELEASE:compile
org.springframework.boot:spring-boot-starter-web:jar:1.5.1.RELEASE:compile
org.liquibase:liquibase-core:jar:3.5.1:compile
org.liquibase.ext:liquibase-hibernate5:jar:3.6:compile
Finally my ask is for help to understand why the application shuts itself down and how could I fix it so it is able to reach 500 connections?
Related
The Kafka service I deployed on the server passed the test a few days ago and was running normally. Later, no other operations were carried out.
Then the service call failed today. I went online and looked at it. It's normal these days. Just this afternoon, the service hung up. Looking at the log, it seems to say that the Kafka service is restarted automatically, and then conflicts with the ID of zookeeper.
Restart the service. Everything is normal. Google doesn't have a good solution. What's the reason?
The following is the relevant log, please help to see, thank you
[2021-11-06 14:43:29,139] WARN Unexpected exception (org.apache.zookeeper.server.NIOServerCnxn)
EndOfStreamException: Unable to read additional data from client, it probably closed the socket: address = /159.89.4.236:59168, session = 0x0
at org.apache.zookeeper.server.NIOServerCnxn.handleFailedRead(NIOServerCnxn.java:163)
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:326)
at org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:522)
at org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:154)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
[2021-11-06 15:00:18,301] INFO Completed load of Log(dir=/tmp/kafka-logs/__consumer_offsets-46, topicId=dPs0Kq4ERHKDaR440Nt8qA, topic=__consumer_offsets, partition=46, highWatermark=0, lastStableOffset=0, logStartOffset=0, logEndOffset=0) with 1 segments in 20ms (53/53 loaded in /tmp/kafka-logs) (kafka.log.LogManager)
[2021-11-06 15:00:18,303] INFO Loaded 53 logs in 2350ms. (kafka.log.LogManager)
[2021-11-06 15:00:18,303] INFO Starting log cleanup with a period of 300000 ms. (kafka.log.LogManager)
[2021-11-06 15:00:18,304] INFO Starting log flusher with a default period of 9223372036854775807 ms. (kafka.log.LogManager)
[2021-11-06 15:00:18,838] INFO [BrokerToControllerChannelManager broker=0 name=forwarding]: Starting (kafka.server.BrokerToControllerRequestThread)
[2021-11-06 15:00:19,148] INFO Updated connection-accept-rate max connection creation rate to 2147483647 (kafka.network.ConnectionQuotas)
[2021-11-06 15:00:19,165] INFO Awaiting socket connections on 0.0.0.0:9092. (kafka.network.Acceptor)
[2021-11-06 15:00:19,237] INFO [SocketServer listenerType=ZK_BROKER, nodeId=0] Created data-plane acceptor and processors for endpoint : ListenerName(PLAINTEXT) (kafka.network.SocketServer)
[2021-11-06 15:00:19,282] INFO [BrokerToControllerChannelManager broker=0 name=alterIsr]: Starting (kafka.server.BrokerToControllerRequestThread)
[2021-11-06 15:00:19,400] INFO [ExpirationReaper-0-DeleteRecords]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2021-11-06 15:00:19,400] INFO [ExpirationReaper-0-Fetch]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2021-11-06 15:00:19,400] INFO [ExpirationReaper-0-Produce]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2021-11-06 15:00:19,409] INFO [ExpirationReaper-0-ElectLeader]: Starting (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[2021-11-06 15:00:19,451] INFO [LogDirFailureHandler]: Starting (kafka.server.ReplicaManager$LogDirFailureHandler)
[2021-11-06 15:00:19,530] INFO Creating /brokers/ids/0 (is it secure? false) (kafka.zk.KafkaZkClient)
[2021-11-06 15:00:19,566] ERROR Error while creating ephemeral at /brokers/ids/0, node already exists and owner '72057654275276801' does not match current session '72057654275276802' (kafka.zk.KafkaZkClient$CheckedEphemeral)
[2021-11-06 15:00:19,572] ERROR [KafkaServer id=0] Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists
at org.apache.zookeeper.KeeperException.create(KeeperException.java:126)
at kafka.zk.KafkaZkClient$CheckedEphemeral.getAfterNodeExists(KafkaZkClient.scala:1904)
at kafka.zk.KafkaZkClient$CheckedEphemeral.create(KafkaZkClient.scala:1842)
at kafka.zk.KafkaZkClient.checkedEphemeralCreate(KafkaZkClient.scala:1809)
at kafka.zk.KafkaZkClient.registerBroker(KafkaZkClient.scala:96)
at kafka.server.KafkaServer.startup(KafkaServer.scala:319)
at kafka.Kafka$.main(Kafka.scala:109)
at kafka.Kafka.main(Kafka.scala)
[2021-11-06 15:00:19,596] INFO [KafkaServer id=0] shutting down (kafka.server.KafkaServer)
[2021-11-06 15:00:19,597] INFO [SocketServer listenerType=ZK_BROKER, nodeId=0] Stopping socket server request processors (kafka.network.SocketServer)```
I have a single step springbatch application. The job is as follows:
#Bean
public Job databaseCursorJob(#Qualifier("databaseCursorStep") Step exampleJobStep,
JobBuilderFactory jobBuilderFactory) {
return jobBuilderFactory.get("databaseCursorJob")
.incrementer(new RunIdIncrementer())
.flow(exampleJobStep)
.end()
.build();
}
I start the job from a springboot application. This afternoon, I attempted to add a second step to the job. Essentially as follows:
#Bean
public Job databaseCursorJob(#Qualifier("databaseCursorStep") Step exampleJobStep,
JobBuilderFactory jobBuilderFactory) {
return jobBuilderFactory.get("databaseCursorJob")
.incrementer(new RunIdIncrementer())
.flow(exampleJobStep).next(partitionStep())
.end()
.build();
}
In other words, just adding the "next(partitionStep()). However, ever since I did this, the job finishes without executing any step (see shell output below). In fact, even after removing the second step and going back to the original job, it refuses to execute the step. Before attempting to add the second step, I never once encountered this problem. I have gone so far as restarting my VM and it still skips the step. I am rather dead in the water until I resolved this. Grateful for any insights. thanks.
2020-09-01 14:49:00.260 INFO 6913 --- [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat started on port(s): 8087 (http) with context path ''
2020-09-01 14:49:00.263 INFO 6913 --- [ main] f.p.r.Application : Started Application in 7.752 seconds (JVM running for 9.092)
2020-09-01 14:49:00.268 INFO 6913 --- [ main] o.s.b.a.b.JobLauncherCommandLineRunner : Running default command line with: []
2020-09-01 14:49:00.579 INFO 6913 --- [ main] o.s.b.c.l.support.SimpleJobLauncher : Job: [FlowJob: [name=databaseCursorJob]] launched with the following parameters: [{}]
2020-09-01 14:49:00.698 INFO 6913 --- [ main] o.s.batch.core.job.SimpleStepHandler : Step already complete or not restartable, so no action to execute: StepExecution: id=120, version=4, name=databaseCursorStep, status=COMPLETED, exitStatus=COMPLETED, readCount=1, filterCount=0, writeCount=1 readSkipCount=0, writeSkipCount=0, processSkipCount=0, commitCount=2, rollbackCount=0, exitDescription=
2020-09-01 14:49:00.730 INFO 6913 --- [ main] o.s.b.c.l.support.SimpleJobLauncher : Job: [FlowJob: [name=databaseCursorJob]] completed with the following parameters: [{}] and the following status: [COMPLETED]
My issue was that my job had no way recover if there was an error or stuck in an unknown state. The step was not "already complete", it never completed. Its status was still "STARTED", and exit code "UNKNOWN" because it never exited. Anyway, my job repository is not in memory, but captured to a local DB, which is why it never resolved itself even after restarting VM (shame on me for not remembering this). So, I was able to fix by wiping out the job instance history, however that was a band-aid. I still have to fix my code to prevent it from happening again.
I also learned I could diagnose by examining the job repository in the database (its all there).
I really resolved this thanks Mr Hassine who responded above several times and pointed me in the right direction. The solution to prevent in the future is indeed addressed in the link he provided in his first response: Spring Batch error (A Job Instance Already Exists) and RunIdIncrementer generates only once
So I have a very simple JMS Listener running on Spring Boot, and running on a Kubernetes cluster on Google Cloud.
The only thing i've defined is the following in my configuration class:
#Bean
public DefaultJmsListenerContainerFactory cFactory(ConnectionFactory connectionFactory, JmsErrorHandler errorHandler) {
logger.info("cFactory called...");
DefaultJmsListenerContainerFactory factory = new DefaultJmsListenerContainerFactory();
factory.setConnectionFactory(connectionFactory);
factory.setErrorHandler(errorHandler);
factory.setTransactionManager(null);
factory.setSessionTransacted(false);
return factory;
}
My Application.properties:
solace.jms.host=tcps://[host]
solace.jms.clientUsername=[username]
solace.jms.clientPassword=[password]
solace.jms.msgVpn=[msgvpn]
solace.jms.queueName=[queuename]
server.port=8080
logging.level.com.solacesystems=INFO
JMS Listener:
#JmsListener(destination="${solace.jms.queueName}", containerFactory = "cFactory")
public void onMessage(Message message) {
[Do stuff with message]
}
I have this issue in the logs that says the following:
2020-02-28 21:08:40.247 INFO 1 --- [nio-8080-exec-6] c.s.j.protocol.impl.TcpClientChannel : Connecting to host 'orig=tcps://[host goes here], scheme=tcps://, host=[host], port=55443' (host 1 of 1, smfclient 294, attempt 1 of 1, this_host_attempt: 1 of 1)
2020-02-28 21:07:21.968 INFO 1 --- [enerContainer-1] c.m.a.notam.listener.JmsMessageListener : Message Received and processed
2020-02-28 21:07:20.572 INFO 1 --- [nio-8080-exec-8] c.s.jcsmp.protocol.smf.SSLSmfClient : closeOutbound() : isSslDowngradeEnabled: false, mSslEngineClosed: false
2020-02-28 21:07:20.572 INFO 1 --- [nio-8080-exec-8] c.s.j.protocol.impl.TcpClientChannel : Channel Closed (smfclient 262)
2020-02-28 21:07:20.535 INFO 1 --- [nio-8080-exec-8] c.s.j.protocol.impl.TcpClientChannel : Connected to host 'orig=tcps://[host]:55443, scheme=tcps://, host=[host], port=55443' (smfclient 262)
It basically loops like this all day long and I can't figure out why. When I run this locally on my development machine the connection remains open and messages are streaming in just fine.
There is no other log to give me a clue why the channel is closing on its own like this.
Any one have any ideas what the problem might be?
I have an application built in spring boot 2.1.2 and spring-data-jpa with hibernate (MariaDB).
Recently I successfull introduced jOOQ (3.11.9) to test a massive insert/update and it greatly reduces the insert times (hibernate with pk autoincrement disables the bulk insert).
In development, however, the application has greatly increased the start-up times (from 22 seconds to 80+ seconds).
Increasing the level of the logs I can see the istructions where the start slows down:
2019-12-12 12:48:32.580 TRACE 8960 --- [ restartedMain] .PrePostAnnotationSecurityMetadataSource : Looking for Pre/Post annotations for method 'dsl' on target class 'class org.jooq.impl.DefaultDSLContext'
2019-12-12 12:48:32.580 TRACE 8960 --- [ restartedMain] .PrePostAnnotationSecurityMetadataSource : No expression annotations found
2019-12-12 12:48:32.580 TRACE 8960 --- [ restartedMain] .PrePostAnnotationSecurityMetadataSource : Looking for Pre/Post annotations for method 'close' on target class 'class org.jooq.impl.DefaultDSLContext'
2019-12-12 12:48:32.581 TRACE 8960 --- [ restartedMain] .PrePostAnnotationSecurityMetadataSource : No expression annotations found
2019-12-12 12:49:01.858 DEBUG 8960 --- [l-1 housekeeper] com.zaxxer.hikari.pool.HikariPool : HikariPool-1 - Pool stats (total=10, active=0, idle=10, waiting=0)
2019-12-12 12:49:19.248 DEBUG 8960 --- [alina-utility-1] org.apache.catalina.session.ManagerBase : Start expire sessions StandardManager at 1576151359246 sessioncount 0
2019-12-12 12:49:19.248 DEBUG 8960 --- [alina-utility-1] org.apache.catalina.session.ManagerBase : End expire sessions StandardManager processingTime 2 expired sessions: 0
2019-12-12 12:49:31.860 DEBUG 8960 --- [l-1 housekeeper] com.zaxxer.hikari.pool.HikariPool : HikariPool-1 - Pool stats (total=10, active=0, idle=10, waiting=0)
2019-12-12 12:49:56.970 TRACE 8960 --- [ restartedMain] o.s.b.f.s.DefaultListableBeanFactory : Finished creating instance of bean 'dsl'
2019-12-12 12:49:56.970 TRACE 8960 --- [ restartedMain] f.a.AutowiredAnnotationBeanPostProcessor : Autowiring by type from bean name 'lstVendorListService' to bean named 'dsl'
2019-12-12 12:49:56.970 TRACE 8960 --- [ restartedMain] o.s.b.f.s.DefaultListableBeanFactory : Returning cached instance of singleton bean 'metaDataSourceAdvisor'
The number of jOOQ entities does not change time of start-up.
Has anyone ever experienced a similar problem?
Recently I've been doing an update of an application from Grails 2.5.5 to Grails 3.2.9.
An application is serving ~3K RPM.
The issue I currently have is a poor performance after application startup. Our normal release process works in the following way (assuming 2 nodes setup):
2 nodes with service are running.
Turn off the first node.
Release new version on the first node.
Service on the first node register itself in Eureka and start consuming requests.
(repeat same on second nodes)
Problems are starting to appear on the 4th step. The application responds quite slowly and timing is really inconsistent - some responses are in an expected time range, but some are really off normal.
Sample logs:
2017-09-01 08:03:38,594 INFO [request][http-nio-12345-exec-72][][]- END controller=98ms
2017-09-01 08:03:38,911 INFO [request][http-nio-12345-exec-101][][]- END controller=134ms
2017-09-01 08:03:38,948 INFO [request][http-nio-12345-exec-56][][]- END controller=211ms
2017-09-01 08:03:39,156 INFO [request][http-nio-12345-exec-82][][]- END controller=95ms
2017-09-01 08:03:39,124 INFO [request][http-nio-12345-exec-111][][]- END controller=98ms
2017-09-01 08:03:39,184 INFO [request][http-nio-12345-exec-110][][]- END controller=4099ms
2017-09-01 08:03:39,399 INFO [request][http-nio-12345-exec-46][][]- END controller=24ms
2017-09-01 08:03:39,428 INFO [request][http-nio-12345-exec-43][][]- END controller=191ms
2017-09-01 08:03:39,744 INFO [request][http-nio-12345-exec-83][][]- END controller=117ms
2017-09-01 08:03:40,335 INFO [request][http-nio-12345-exec-56][][]- END controller=483ms
2017-09-01 08:03:45,595 INFO [request][http-nio-12345-exec-110][][]- END controller=5623ms
2017-09-01 08:03:45,618 INFO [request][http-nio-12345-exec-83][][]- END controller=5274ms
2017-09-01 08:03:45,629 INFO [request][http-nio-12345-exec-144][][]- END controller=2007ms
2017-09-01 08:03:45,671 INFO [request][http-nio-12345-exec-119][][]- END controller=4591ms
As you can see from it - few requests went below 100ms and some - more than 5 seconds.
My assumption is that’s happening due to slow warm up of Grails 3 application and lazy class loading.
Things I've already done:
grais.gorm.autowire = false
grais.gorm.reactor.events = false
Delay registration of service in Eureka for 30 seconds (to wait while applications is fully loaded)
The next thing which comes to my mind is to compile the project with #CompileStatic annotation.