Poor performance on Grails 3.2.9 application startup - java

Recently I've been doing an update of an application from Grails 2.5.5 to Grails 3.2.9.
An application is serving ~3K RPM.
The issue I currently have is a poor performance after application startup. Our normal release process works in the following way (assuming 2 nodes setup):
2 nodes with service are running.
Turn off the first node.
Release new version on the first node.
Service on the first node register itself in Eureka and start consuming requests.
(repeat same on second nodes)
Problems are starting to appear on the 4th step. The application responds quite slowly and timing is really inconsistent - some responses are in an expected time range, but some are really off normal.
Sample logs:
2017-09-01 08:03:38,594 INFO [request][http-nio-12345-exec-72][][]- END controller=98ms
2017-09-01 08:03:38,911 INFO [request][http-nio-12345-exec-101][][]- END controller=134ms
2017-09-01 08:03:38,948 INFO [request][http-nio-12345-exec-56][][]- END controller=211ms
2017-09-01 08:03:39,156 INFO [request][http-nio-12345-exec-82][][]- END controller=95ms
2017-09-01 08:03:39,124 INFO [request][http-nio-12345-exec-111][][]- END controller=98ms
2017-09-01 08:03:39,184 INFO [request][http-nio-12345-exec-110][][]- END controller=4099ms
2017-09-01 08:03:39,399 INFO [request][http-nio-12345-exec-46][][]- END controller=24ms
2017-09-01 08:03:39,428 INFO [request][http-nio-12345-exec-43][][]- END controller=191ms
2017-09-01 08:03:39,744 INFO [request][http-nio-12345-exec-83][][]- END controller=117ms
2017-09-01 08:03:40,335 INFO [request][http-nio-12345-exec-56][][]- END controller=483ms
2017-09-01 08:03:45,595 INFO [request][http-nio-12345-exec-110][][]- END controller=5623ms
2017-09-01 08:03:45,618 INFO [request][http-nio-12345-exec-83][][]- END controller=5274ms
2017-09-01 08:03:45,629 INFO [request][http-nio-12345-exec-144][][]- END controller=2007ms
2017-09-01 08:03:45,671 INFO [request][http-nio-12345-exec-119][][]- END controller=4591ms
As you can see from it - few requests went below 100ms and some - more than 5 seconds.
My assumption is that’s happening due to slow warm up of Grails 3 application and lazy class loading.
Things I've already done:
grais.gorm.autowire = false
grais.gorm.reactor.events = false
Delay registration of service in Eureka for 30 seconds (to wait while applications is fully loaded)
The next thing which comes to my mind is to compile the project with #CompileStatic annotation.

Related

Optaplanner's benchmark warm up - OutOfMemory

While trying to test the solution's solvers using a benchmark configuration, I encounter the follow exception :
2021-12-22 15:24:37.328 WARN 22684 --- [ Test worker] c.o.b.i.D.singleBenchmarkRunnerException : The warm up singleBenchmarkRunner (Problem_0_Currently used_0) with random seed (null) failed.
java.lang.OutOfMemoryError: Java heap space
2021-12-22 15:24:37.329 WARN 22684 --- [ Test worker] c.o.b.i.D.singleBenchmarkRunnerException : The warm up singleBenchmarkRunner (Problem_0_Currently used_0) with random seed (null) failed.
java.lang.OutOfMemoryError: Java heap space
2021-12-22 15:24:37.330 WARN 22684 --- [ Test worker] c.o.b.i.D.singleBenchmarkRunnerException : The warm up singleBenchmarkRunner (Problem_0_Currently used_0) with random seed (null) failed.
java.lang.OutOfMemoryError: Java heap space
at java.base/java.lang.Long.valueOf(Long.java:1207)
at myrostering.solver.PEC.LambdaExtractorEC9F24820AB70C5865CE63ED29F967E9.apply(LambdaExtractorEC9F24820AB70C5865CE63ED29F967E9.java:69)
at myrostering.solver.PEC.LambdaExtractorEC9F24820AB70C5865CE63ED29F967E9.apply(LambdaExtractorEC9F24820AB70C5865CE63ED29F967E9.java:1)
at org.drools.model.functions.Function1$Impl.apply(Function1.java:35)
at org.drools.modelcompiler.constraints.LambdaReadAccessor.getValue(LambdaReadAccessor.java:42)
at org.drools.core.rule.Declaration.getValue(Declaration.java:258)
at org.drools.core.rule.Declaration.getValue(Declaration.java:253)
at org.drools.modelcompiler.constraints.BindingEvaluator.getArgument(BindingEvaluator.java:59)
at org.drools.modelcompiler.constraints.ConstraintEvaluator$InnerEvaluator.getArgument(ConstraintEvaluator.java:242)
at org.drools.modelcompiler.constraints.ConstraintEvaluator$InnerEvaluator$_2.evaluate(ConstraintEvaluator.java:309)
at org.drools.modelcompiler.constraints.ConstraintEvaluator.evaluate(ConstraintEvaluator.java:124)
at org.drools.modelcompiler.constraints.LambdaConstraint.isAllowedCachedLeft(LambdaConstraint.java:187)
at org.drools.core.common.SingleBetaConstraints.isAllowedCachedLeft(SingleBetaConstraints.java:132)
at org.drools.core.phreak.PhreakAccumulateNode.doLeftInserts(PhreakAccumulateNode.java:178)
at org.drools.core.phreak.PhreakAccumulateNode.doNode(PhreakAccumulateNode.java:89)
at org.drools.core.phreak.RuleNetworkEvaluator.switchOnDoBetaNode(RuleNetworkEvaluator.java:591)
at org.drools.core.phreak.RuleNetworkEvaluator.evalBetaNode(RuleNetworkEvaluator.java:558)
at org.drools.core.phreak.RuleNetworkEvaluator.evalNode(RuleNetworkEvaluator.java:385)
at org.drools.core.phreak.RuleNetworkEvaluator.innerEval(RuleNetworkEvaluator.java:345)
at org.drools.core.phreak.RuleNetworkEvaluator.outerEval(RuleNetworkEvaluator.java:181)
at org.drools.core.phreak.RuleNetworkEvaluator.evaluateNetwork(RuleNetworkEvaluator.java:139)
at org.drools.core.phreak.RuleExecutor.reEvaluateNetwork(RuleExecutor.java:235)
at org.drools.core.phreak.RuleExecutor.evaluateNetworkAndFire(RuleExecutor.java:91)
at org.drools.core.concurrent.AbstractRuleEvaluator.internalEvaluateAndFire(AbstractRuleEvaluator.java:33)
at org.drools.core.concurrent.SequentialRuleEvaluator.evaluateAndFire(SequentialRuleEvaluator.java:43)
at org.drools.core.common.DefaultAgenda.fireLoop(DefaultAgenda.java:753)
at org.drools.core.common.DefaultAgenda.internalFireAllRules(DefaultAgenda.java:700)
at org.drools.core.common.DefaultAgenda.fireAllRules(DefaultAgenda.java:692)
at org.drools.core.impl.StatefulKnowledgeSessionImpl.internalFireAllRules(StatefulKnowledgeSessionImpl.java:1225)
at org.drools.core.impl.StatefulKnowledgeSessionImpl.fireAllRules(StatefulKnowledgeSessionImpl.java:1216)
at org.drools.core.impl.StatefulKnowledgeSessionImpl.fireAllRules(StatefulKnowledgeSessionImpl.java:1200)
at org.optaplanner.core.impl.score.director.drools.DroolsScoreDirector.calculateScore(DroolsScoreDirector.java:105)
Here is the test class I ran:
#SpringBootTest(classes = MyApplication.class)
#EnableConfigurationProperties({ApplicationProperties.class, MyRosterProperties.class})
public class SolverBenchmarkTest {
private PlannerBenchmarkFactory benchmarkFactory = PlannerBenchmarkFactory.createFromXmlResource(
"myrostering/benchmark/benchmarkSolverConfig.xml");
#Autowired
MyRosterGenerator myRosterGenerator;
#Test
public void benchmarkBasicRostering() {
MyRoster mr = myRosterGenerator.createMyRoster();
PlannerBenchmark benchmark = benchmarkFactory.buildPlannerBenchmark(mr);
benchmark.benchmarkAndShowReportInBrowser();
}
}
Here is the benchmark configuration file :
<?xml version="1.0" encoding="UTF-8"?>
<plannerBenchmark xmlns="https://www.optaplanner.org/xsd/benchmark" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="https://www.optaplanner.org/xsd/benchmark https://www.optaplanner.org/xsd/benchmark/benchmark.xsd">
<benchmarkDirectory>local/benchmark/data/my-roster</benchmarkDirectory>
<parallelBenchmarkCount>AUTO</parallelBenchmarkCount>
<warmUpSecondsSpentLimit>30</warmUpSecondsSpentLimit>
<inheritedSolverBenchmark>
<solver>
<!-- This part of the solver configuration must be the same as the one used by the planner, otherwise, the benchmark test is pointless -->
<moveThreadCount>4</moveThreadCount>
<solutionClass>myrostering.domain.MyRoster</solutionClass>
<entityClass>myrostering.domain.Assignment</entityClass>
<scoreDirectorFactory>
<scoreDrl>myrostering/solver/myRosteringScoreRules.drl</scoreDrl>
</scoreDirectorFactory>
<termination>
<!-- Adding this secondsSpentLimit (contrary to no limit set for the planner) to avoid the benchmark running for too long -->
<secondsSpentLimit>60</secondsSpentLimit>
<bestScoreLimit>0hard/0medium/0soft</bestScoreLimit>
</termination>
<constructionHeuristic>
<constructionHeuristicType>STRONGEST_FIT</constructionHeuristicType>
</constructionHeuristic>
</solver>
</inheritedSolverBenchmark>
<solverBenchmark>
<name>Currently used</name>
<solver>
<localSearch>
<unionMoveSelector>
<moveListFactory>
<cacheType>PHASE</cacheType>
<moveListFactoryClass>
myrostering.solver.move.factory.ChangeMoveFactory
</moveListFactoryClass>
</moveListFactory>
<moveListFactory>
<cacheType>PHASE</cacheType>
<moveListFactoryClass>
myrostering.solver.move.factory.SwapMoveFactory
</moveListFactoryClass>
</moveListFactory>
</unionMoveSelector>
<acceptor>
<entityTabuSize>5</entityTabuSize>
<simulatedAnnealingStartingTemperature>15000hard/10medium/1000soft</simulatedAnnealingStartingTemperature>
</acceptor>
<forager>
<acceptedCountLimit>4</acceptedCountLimit>
</forager>
</localSearch>
</solver>
</solverBenchmark>
</plannerBenchmark>
Also, I'd like to add that we run the solver.solve() without an issue - even if the dataset is quite large (150 to 300 Mo for the file containing the solution when serialized). So I'm a bit surprised when the benchmark fails on warm up...
EDIT:
I've changed the configuration for these two parameters :
<parallelBenchmarkCount>1</parallelBenchmarkCount>
...
<secondsSpentLimit>600</secondsSpentLimit>
But I still got the following exception :
2022-01-03 10:53:49.850 INFO 21696 --- [nchmarkThread-1] o.d.c.kie.builder.impl.KieContainerImpl : Start creation of KieBase: defaultKieBase
2022-01-03 10:53:49.909 INFO 21696 --- [nchmarkThread-1] o.d.c.kie.builder.impl.KieContainerImpl : End creation of KieBase: defaultKieBase
2022-01-03 10:54:32.585 INFO 21696 --- [nchmarkThread-1] o.o.core.impl.solver.DefaultSolver : Solving started: time spent (41506), best score (-38295462hard/38260medium/3640soft), environment mode (REPRODUCIBLE), move thread count (4), random (JDK with seed 0).
2022-01-03 10:54:33.611 ERROR 21696 --- [nchmarkThread-1] o.o.core.impl.solver.thread.ThreadUtils : Multithreaded Local Search's ExecutorService didn't terminate within timeout (1 seconds).
2022-01-03 10:54:33.611 INFO 21696 --- [nchmarkThread-1] o.o.c.i.h.thread.MoveThreadRunner : Score calculation speed will be too low because move thread (0)'s destroy wasn't processed soon enough.
2022-01-03 10:54:33.611 INFO 21696 --- [nchmarkThread-1] o.o.c.i.h.thread.MoveThreadRunner : Score calculation speed will be too low because move thread (1)'s destroy wasn't processed soon enough.
2022-01-03 10:54:33.611 INFO 21696 --- [nchmarkThread-1] o.o.c.i.h.thread.MoveThreadRunner : Score calculation speed will be too low because move thread (2)'s destroy wasn't processed soon enough.
2022-01-03 10:54:33.611 INFO 21696 --- [nchmarkThread-1] o.o.c.i.h.thread.MoveThreadRunner : Score calculation speed will be too low because move thread (3)'s destroy wasn't processed soon enough.
2022-01-03 10:54:33.612 INFO 21696 --- [nchmarkThread-1] .c.i.c.DefaultConstructionHeuristicPhase : Construction Heuristic phase (0) ended: time spent (42533), best score (-38295462hard/38260medium/3640soft), score calculation speed (0/sec), step total (0).
2022-01-03 10:56:00.115 WARN 21696 --- [ Test worker] c.o.b.i.D.singleBenchmarkRunnerException : The subSingleBenchmarkRunner (Problem_0_Currently used_0) with random seed (null) failed.
java.lang.OutOfMemoryError: Java heap space
at java.base/java.util.Arrays.copyOf(Arrays.java:3480)
at java.base/java.util.ArrayList.grow(ArrayList.java:237)
at java.base/java.util.ArrayList.grow(ArrayList.java:244)
at java.base/java.util.ArrayList.add(ArrayList.java:454)
at java.base/java.util.ArrayList.add(ArrayList.java:467)
at be.myrostering.solver.move.factory.MySwapMoveFactory.createMoveList(MySwapMoveFactory.java:50)
at be.myrostering.solver.move.factory.MySwapMoveFactory.createMoveList(MySwapMoveFactory.java:30)
at org.optaplanner.core.impl.heuristic.selector.move.factory.MoveListFactoryToMoveSelectorBridge.constructCache(MoveListFactoryToMoveSelectorBridge.java:72)
at org.optaplanner.core.impl.heuristic.selector.common.SelectionCacheLifecycleBridge.phaseStarted(SelectionCacheLifecycleBridge.java:51)
at org.optaplanner.core.impl.phase.event.PhaseLifecycleSupport.firePhaseStarted(PhaseLifecycleSupport.java:37)
at org.optaplanner.core.impl.heuristic.selector.AbstractSelector.phaseStarted(AbstractSelector.java:50)
at org.optaplanner.core.impl.phase.event.PhaseLifecycleSupport.firePhaseStarted(PhaseLifecycleSupport.java:37)
at org.optaplanner.core.impl.heuristic.selector.AbstractSelector.phaseStarted(AbstractSelector.java:50)
at org.optaplanner.core.impl.localsearch.decider.LocalSearchDecider.phaseStarted(LocalSearchDecider.java:94)
at org.optaplanner.core.impl.localsearch.decider.MultiThreadedLocalSearchDecider.phaseStarted(MultiThreadedLocalSearchDecider.java:92)
at org.optaplanner.core.impl.localsearch.DefaultLocalSearchPhase.phaseStarted(DefaultLocalSearchPhase.java:141)
at org.optaplanner.core.impl.localsearch.DefaultLocalSearchPhase.solve(DefaultLocalSearchPhase.java:82)
at org.optaplanner.core.impl.solver.AbstractSolver.runPhases(AbstractSolver.java:99)
at org.optaplanner.core.impl.solver.DefaultSolver.solve(DefaultSolver.java:192)
at org.optaplanner.benchmark.impl.SubSingleBenchmarkRunner.call(SubSingleBenchmarkRunner.java:122)
at org.optaplanner.benchmark.impl.SubSingleBenchmarkRunner.call(SubSingleBenchmarkRunner.java:42)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
at java.base/java.lang.Thread.run(Thread.java:832)
2022-01-03 10:56:00.603 INFO 21696 --- [ Test worker] o.o.b.impl.report.BenchmarkReport : Generating benchmark report...
VERSION_2_3_31
java.lang.NoSuchFieldError: VERSION_2_3_31
at org.optaplanner.benchmark.impl.report.BenchmarkReport.writeHtmlOverviewFile(BenchmarkReport.java:828)
at org.optaplanner.benchmark.impl.report.BenchmarkReport.writeReport(BenchmarkReport.java:318)
at org.optaplanner.benchmark.impl.DefaultPlannerBenchmark.benchmarkingEnded(DefaultPlannerBenchmark.java:311)
at org.optaplanner.benchmark.impl.DefaultPlannerBenchmark.benchmark(DefaultPlannerBenchmark.java:100)
at org.optaplanner.benchmark.impl.DefaultPlannerBenchmark.benchmarkAndShowReportInBrowser(DefaultPlannerBenchmark.java:424)
On a final note, it appears that the problem might not be linked to Optaplanner (because the out-of-memory is triggered in MySwapMoveFactory) - if so, I'll close this post. But it would be still odd that it works when running the solver but not the benchmark...
MoveListFactory scales badly, consuming a lot of memory and cpu.
Some kind of moves have billions of possible moves. For example a 3 swap on 10000 shifts has 1 trillion moves. That doesn't fit into a few GB RAM and it takes ages to generate.
Use a MoveIteratorFactory instead and don't generate a list of moves, but generate them Just In Time, just like the default selectors do. See docs.
Increase memory, for example with VM option -Xmx4g
Also note that parallelBenchmarkCount AUTO currently doesn't take into account that moveThreadCount is not NONE. So your benchmarks will not be accurate, because if you have 16 cores, parallelBenchmarkCount AUTO will resolve to 8. With moveThreadCount 4 (+ 1 solver thread), you'll be using 32+ cores but only have 16 cores. This probably should be reported as an issue in optaplanner's jira for parallelBenchmarkCount AUTO.

Spring application shuts itself down at 200 DB connections

On an legacy production application we were having an issue where the application crashed because it ran out of connections (the default was 100 connections) as a temporal solution we decided to increase the available connections to 500 but when the application reached 200 connections it just stopped itself, with no errors on the logs, just like a simple shut down.
I added a couple of logs that are generated each 15 secs for clearly seeing the behavior of the connections, these logs prints the idle connection and active connection as well as the full object of the Datasource properties. Before the application shut down the following logs where added:
Datasource idle connections: 0, active connections: 200
Datasource properties: org.apache.tomcat.jdbc.pool.DataSource#20b2475a{ConnectionPool[defaultAutoCommit=null; defaultReadOnly=null; defaultTransactionIsolation=-1; defaultCatalog=null; driverClassName=com.mysql.jdbc.Driver; maxActive=500; maxIdle=500; minIdle=10; initialSize=10; maxWait=30000; testOnBorrow=true; testOnReturn=false; timeBetweenEvictionRunsMillis=5000; numTestsPerEvictionRun=0; minEvictableIdleTimeMillis=60000; testWhileIdle=false; testOnConnect=false; password=********; url=jdbc:mysql://127.0.0.1:3306/db_name?createDatabaseIfNotExist=true; username=username; validationQuery=SELECT 1; validationQueryTimeout=-1; validatorClassName=null; validationInterval=3000; accessToUnderlyingConnectionAllowed=true; removeAbandoned=false; removeAbandonedTimeout=60; logAbandoned=false; connectionProperties=null; initSQL=null; jdbcInterceptors=null; jmxEnabled=true; fairQueue=true; useEquals=true; abandonWhenPercentageFull=0; maxAge=0; useLock=false; dataSource=null; dataSourceJNDI=null; suspectTimeout=0; alternateUsernameAllowed=false; commitOnReturn=false; rollbackOnReturn=false; useDisposableConnectionFacade=true; logValidationErrors=false; propagateInterruptState=false; ignoreExceptionOnPreLoad=false; }
After that the application shut itself down and I found following logs with no errors before:
2021-02-03 20:23:02.618 INFO 1 --- [ Thread-4] ationConfigEmbeddedWebApplicationContext : Closing org.springframework.boot.context.embedded.AnnotationConfigEmbeddedWebApplicationContext#8807e25: startup date [Wed Feb 03 19:49:09 GMT 2021]; root of context hierarchy
2021-02-03 20:23:02.623 INFO 1 --- [ Thread-4] o.s.c.support.DefaultLifecycleProcessor : Stopping beans in phase 0
2021-02-03 20:23:02.643 INFO 1 --- [ Thread-4] o.s.j.e.a.AnnotationMBeanExporter : Unregistering JMX-exposed beans on shutdown
2021-02-03 20:23:02.647 INFO 1 --- [ Thread-4] j.LocalContainerEntityManagerFactoryBean : Closing JPA EntityManagerFactory for persistence unit 'default'
A couple of relevant dependencies and their versions:
org.springframework:spring-webmvc:jar:4.3.6.RELEASE:compile
org.springframework.boot:spring-boot-starter-data-jpa:jar:1.5.1.RELEASE:compile
org.springframework.boot:spring-boot-starter-jdbc:jar:1.5.1.RELEASE:compile
org.apache.tomcat:tomcat-jdbc:jar:8.5.11:compile
org.hibernate:hibernate-core:jar:5.0.11.Final:compile
org.springframework.data:spring-data-jpa:jar:1.11.0.RELEASE:compile
org.springframework.boot:spring-boot-starter-web:jar:1.5.1.RELEASE:compile
org.liquibase:liquibase-core:jar:3.5.1:compile
org.liquibase.ext:liquibase-hibernate5:jar:3.6:compile
Finally my ask is for help to understand why the application shuts itself down and how could I fix it so it is able to reach 500 connections?

SpringBatch - Step no longer executing: Step already complete or not restartable

I have a single step springbatch application. The job is as follows:
#Bean
public Job databaseCursorJob(#Qualifier("databaseCursorStep") Step exampleJobStep,
JobBuilderFactory jobBuilderFactory) {
return jobBuilderFactory.get("databaseCursorJob")
.incrementer(new RunIdIncrementer())
.flow(exampleJobStep)
.end()
.build();
}
I start the job from a springboot application. This afternoon, I attempted to add a second step to the job. Essentially as follows:
#Bean
public Job databaseCursorJob(#Qualifier("databaseCursorStep") Step exampleJobStep,
JobBuilderFactory jobBuilderFactory) {
return jobBuilderFactory.get("databaseCursorJob")
.incrementer(new RunIdIncrementer())
.flow(exampleJobStep).next(partitionStep())
.end()
.build();
}
In other words, just adding the "next(partitionStep()). However, ever since I did this, the job finishes without executing any step (see shell output below). In fact, even after removing the second step and going back to the original job, it refuses to execute the step. Before attempting to add the second step, I never once encountered this problem. I have gone so far as restarting my VM and it still skips the step. I am rather dead in the water until I resolved this. Grateful for any insights. thanks.
2020-09-01 14:49:00.260 INFO 6913 --- [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat started on port(s): 8087 (http) with context path ''
2020-09-01 14:49:00.263 INFO 6913 --- [ main] f.p.r.Application : Started Application in 7.752 seconds (JVM running for 9.092)
2020-09-01 14:49:00.268 INFO 6913 --- [ main] o.s.b.a.b.JobLauncherCommandLineRunner : Running default command line with: []
2020-09-01 14:49:00.579 INFO 6913 --- [ main] o.s.b.c.l.support.SimpleJobLauncher : Job: [FlowJob: [name=databaseCursorJob]] launched with the following parameters: [{}]
2020-09-01 14:49:00.698 INFO 6913 --- [ main] o.s.batch.core.job.SimpleStepHandler : Step already complete or not restartable, so no action to execute: StepExecution: id=120, version=4, name=databaseCursorStep, status=COMPLETED, exitStatus=COMPLETED, readCount=1, filterCount=0, writeCount=1 readSkipCount=0, writeSkipCount=0, processSkipCount=0, commitCount=2, rollbackCount=0, exitDescription=
2020-09-01 14:49:00.730 INFO 6913 --- [ main] o.s.b.c.l.support.SimpleJobLauncher : Job: [FlowJob: [name=databaseCursorJob]] completed with the following parameters: [{}] and the following status: [COMPLETED]
My issue was that my job had no way recover if there was an error or stuck in an unknown state. The step was not "already complete", it never completed. Its status was still "STARTED", and exit code "UNKNOWN" because it never exited. Anyway, my job repository is not in memory, but captured to a local DB, which is why it never resolved itself even after restarting VM (shame on me for not remembering this). So, I was able to fix by wiping out the job instance history, however that was a band-aid. I still have to fix my code to prevent it from happening again.
I also learned I could diagnose by examining the job repository in the database (its all there).
I really resolved this thanks Mr Hassine who responded above several times and pointed me in the right direction. The solution to prevent in the future is indeed addressed in the link he provided in his first response: Spring Batch error (A Job Instance Already Exists) and RunIdIncrementer generates only once

apoc.periodic.iterate fails with exception: java.util.concurrent.RejectedExecutionException

I am trying run the annotation function of graphaware within Neo4J (see documentation here). I have a set of 5000 nodes (KnowledgeArticles) with textual data in the content property. To annotate those I run the following query in Neo4J desktop:
CALL apoc.periodic.iterate(
"MATCH (n:KnowledgeArticle) RETURN n",
"CALL ga.nlp.annotate({text: n.content, id: id(n)})
YIELD result MERGE (n)-[:HAS_ANNOTATED_TEXT]->(result)", {batchSize:1, iterateList:true})
After annotating approximately 200 to 300 KnowledgeArticles the database shuts down and provides the error:
Neo.ClientError.Procedure.ProcedureCallFailed: Failed to invoke procedure `apoc.periodic.iterate`: Caused by:
java.util.concurrent.RejectedExecutionException: Task
java.util.concurrent.FutureTask#373b81ee rejected from
java.util.concurrent.ThreadPoolExecutor#285a2901[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 288]
I have experimented using different values for batchSize or setting iterateList to false, but none of this helped.
Also, I have tried performing the above iterate call limiting it only to 150 nodes. This works fine for the first time I call it, but when I perform it for a second time it again provides the same error, stating that the completed_task is about 200 to 300. The processor in the back thus seems to 'remember' the amount of tasks it has run in total as of the first time the database has started.
Could you help me resolve this issue. I want to run the above query not necessarily from Neo4j desktop, but eventually with py2neo from Python using graph.run([iterate-query]). If there is thus any way of solving this from Python, that would be even better.
Thank you!
PS. The debug log provides the following output (as of the last few iterations of the annotation up until the shut down):
2019-05-21 12:46:10.359+0000 INFO [c.g.n.p.p.AnnotatedTextPersister] Start storing annotatedText 251906
2019-05-21 12:46:13.784+0000 INFO [c.g.n.p.p.AnnotatedTextPersister] end storing annotatedText 251906. It took: 3425
2019-05-21 12:46:13.786+0000 INFO [c.g.n.e.EventDispatcher] Notifying listeners for event {}
2019-05-21 12:46:13.788+0000 INFO [c.g.n.e.EventDispatcher] Notifying listeners for event {}
2019-05-21 12:46:13.800+0000 INFO [c.g.n.u.ProcessorUtils] Taking default pipeline from configuration : myPipeline
2019-05-21 12:46:13.868+0000 INFO [c.g.n.p.s.StanfordTextProcessor] Time for pipeline annotation (myPipeline): 67. Text length: 954
2019-05-21 12:46:13.869+0000 INFO [c.g.n.NLPManager] Time to annotate 68
2019-05-21 12:46:13.869+0000 INFO [c.g.n.e.EventDispatcher] Notifying listeners for event {}
2019-05-21 12:46:13.869+0000 INFO [c.g.n.p.p.AnnotatedTextPersister] Start storing annotatedText 251907
2019-05-21 12:46:15.848+0000 INFO [c.g.n.p.p.AnnotatedTextPersister] end storing annotatedText 251907. It took: 1978
2019-05-21 12:46:15.848+0000 INFO [c.g.n.e.EventDispatcher] Notifying listeners for event {}
2019-05-21 12:46:15.862+0000 INFO [c.g.n.e.EventDispatcher] Notifying listeners for event {}
2019-05-21 12:46:15.915+0000 INFO [c.g.n.u.ProcessorUtils] Taking default pipeline from configuration : myPipeline
2019-05-21 12:46:16.294+0000 INFO [c.g.n.p.s.StanfordTextProcessor] Time for pipeline annotation (myPipeline): 378. Text length: 2641
2019-05-21 12:46:16.295+0000 INFO [c.g.n.NLPManager] Time to annotate 379
2019-05-21 12:46:16.296+0000 INFO [c.g.n.e.EventDispatcher] Notifying listeners for event {}
2019-05-21 12:46:16.296+0000 INFO [c.g.n.p.p.AnnotatedTextPersister] Start storing annotatedText 251908
2019-05-21 12:46:16.421+0000 INFO [o.n.k.a.DatabaseAvailabilityGuard] Database graph.db is unavailable.
2019-05-21 12:46:17.018+0000 INFO [c.g.s.f.b.GraphAwareServerBootstrapper] stopped
2019-05-21 12:46:17.020+0000 INFO [o.n.g.f.GraphDatabaseFacadeFactory] Shutdown started
2019-05-21 12:46:17.149+0000 INFO [o.n.g.f.GraphDatabaseFacadeFactory] Shutting down 'graph.db' database.
2019-05-21 12:46:17.150+0000 INFO [o.n.g.f.GraphDatabaseFacadeFactory] Shutdown started
2019-05-21 12:46:17.164+0000 INFO [o.n.b.i.BackupServer] BackupServer communication server shutting down and unbinding from /127.0.0.1:6362
2019-05-21 12:46:17.226+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Checkpoint triggered by database shutdown # txId: 7720 checkpoint started...
2019-05-21 12:46:17.247+0000 INFO [o.n.k.i.s.c.CountsTracker] Rotated counts store at transaction 7720 to [/Users/{my.user.name}/Library/Application Support/Neo4j Desktop/Application/neo4jDatabases/database-e2babea7-0332-4c2c-bf1d-076d4feed49a/installation-3.5.4/data/databases/graph.db/neostore.counts.db.a], from [/Users/{my.user.name}/Library/Application Support/Neo4j Desktop/Application/neo4jDatabases/database-e2babea7-0332-4c2c-bf1d-076d4feed49a/installation-3.5.4/data/databases/graph.db/neostore.counts.db.b].
2019-05-21 12:46:17.644+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Checkpoint triggered by database shutdown # txId: 7720 checkpoint completed in 418ms
2019-05-21 12:46:17.647+0000 INFO [o.n.k.i.t.l.p.LogPruningImpl] No log version pruned, last checkpoint was made in version 3
2019-05-21 12:46:17.698+0000 INFO [o.n.i.d.DiagnosticsManager] --- STOPPING diagnostics START ---
2019-05-21 12:46:17.700+0000 INFO [o.n.i.d.DiagnosticsManager] --- STOPPING diagnostics END ---
2019-05-21 12:46:17.706+0000 INFO [c.g.r.BaseGraphAwareRuntime] Shutting down GraphAware Runtime...
2019-05-21 12:46:17.709+0000 INFO [c.g.r.m.BaseModuleManager] Shutting down module UIDM
2019-05-21 12:46:17.709+0000 INFO [c.g.r.m.BaseModuleManager] Shutting down module NLP
2019-05-21 12:46:17.712+0000 INFO [c.g.r.s.RotatingTaskScheduler] Terminating task scheduler...
2019-05-21 12:46:17.712+0000 INFO [c.g.r.s.RotatingTaskScheduler] Task scheduler terminated successfully.
2019-05-21 12:46:17.714+0000 INFO [c.g.r.BaseGraphAwareRuntime] GraphAware Runtime shut down.

Spring boot metrics shows HikariCP connection creation count 1, when HikariCP debug log's connection total is 2

I use Spring-boot version 2.0.2 to make web application with default connection pool HikariCP.
HikariCP debug log shows collect connection size like 2, but spring boot metrics show connection creation is 1.
Did I misunderstand?
Thanks in advance.
application.yml is the below
spring:
datasource:
minimum-idle: 2
maximum-pool-size: 7
Log:
DEBUG 8936 --- [l-1 housekeeper] com.zaxxer.hikari.pool.HikariPool : HikariPool-1 - After cleanup stats (total=2, active=0, idle=2, waiting=0)
URL for metrics:http://localhost:8080/xxx/metrics/hikaricp.connections.creation
Response:
{
name: "hikaricp.connections.creation",
measurements:
[
{
statistic: "COUNT",
value: 1 <--- I think this should be 2
},
...
]
}
What you are seeing is HikariCPs failfast check behaviour with regards to tracking metrics at this stage.
(I actually dug into this as I didn't know the answer beforehand)
At this stage a MetricsTracker isn't set yet and thus the initial connection creation isn't counted. In case the initial connection could be established, HikariCP just keeps this connection. In your case only the next connection creation is counted.
In case you really want the metric value to be "correct" you can set spring.datasource.hikari.initialization-fail-timeout=-1. The behaviour is described in HikariCPs README under initializationFailTimeout.
If you really need a "correct" value is debatable as you'll only miss that initial count. Ideally you'll want to reason about connection creation counts in a specific time window - e.g. rate of connection creations per minute to determine if you dispose connections too early from the pool.

Categories