FusionAuth incomplete reindex with AWS Elasticsearch - java

I am migrating from a self-hosted Elasticsearch FusionAuth search to an AWS Elasticsearch Service solution.
I have a new FusionAuth app EC2 instance reading from the in-use database that is configured to use the new Elasticsearch service.
On triggering a reindex from the new app instance I see that only around 60k or 62.5k documents are being written to the new index when I am expecting roughly 6mil.
I see no errors from AWS's Elasticsearch Service and in the app's logs I can see: (endpoint intentionally omitted)
Feb 13, 2020 10:18:46.116 AM INFO io.fusionauth.api.service.search.ElasticSearchClientProvider - Connecting to FusionAuth Search Engine at [https://vpc-<<omitted>>.eu-west-1.es.amazonaws.com]
13-Feb-2020 11:19:55.176 INFO [http-nio-9011-exec-3] org.apache.coyote.http11.Http11Processor.service Error parsing HTTP request header
Note: further occurrences of HTTP header parsing errors will be logged at DEBUG level.
java.lang.IllegalArgumentException: Invalid character found in method name. HTTP method names must be tokens
at org.apache.coyote.http11.Http11InputBuffer.parseRequestLine(Http11InputBuffer.java:430)
at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:684)
at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66)
at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:808)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1498)
"/usr/local/fusionauth/logs/fusionauth-app.log" [readonly] 43708L, 4308629C 42183,1 96%
at io.fusionauth.api.service.search.client.domain.documents.IndexUser.<init>(IndexUser.java:79)
at io.fusionauth.api.service.search.ElasticsearchSearchEngine.lambda$index$1(ElasticsearchSearchEngine.java:140)
at java.util.ArrayList.forEach(ArrayList.java:1257)
at io.fusionauth.api.service.search.ElasticsearchSearchEngine.index(ElasticsearchSearchEngine.java:140)
at io.fusionauth.api.service.user.ReindexRunner$ReindexWorker.run(ReindexRunner.java:101)
at java.lang.Thread.run(Thread.java:748)
Exception in thread "Thread-14" java.lang.NullPointerException
at io.fusionauth.api.service.search.client.domain.documents.IndexUser.<init>(IndexUser.java:79)
at io.fusionauth.api.service.search.ElasticsearchSearchEngine.lambda$index$1(ElasticsearchSearchEngine.java:140)
at java.util.ArrayList.forEach(ArrayList.java:1257)
at io.fusionauth.api.service.search.ElasticsearchSearchEngine.index(ElasticsearchSearchEngine.java:140)
at io.fusionauth.api.service.user.ReindexRunner$ReindexWorker.run(ReindexRunner.java:101)
at java.lang.Thread.run(Thread.java:748)
Exception in thread "Thread-13" java.lang.NullPointerException
Exception in thread "Thread-11" java.lang.NullPointerException
Exception in thread "Thread-12" java.lang.NullPointerException
Feb 18, 2020 10:23:29.064 AM INFO io.fusionauth.api.service.user.ReindexRunner - Reindex completed in [86797] ms or [86] seconds.
Although there are some exceptions there is also an "Reindex completed" INFO log at the end.
Not knowing the ins-and-outs of Elasticsearch I'm also not sure where to start in investigating a NullPointerException.

It looks like the re-index operation is taking an exception which is likely the cause of the truncated index.
Exception in thread "Thread-14" java.lang.NullPointerException
at io.fusionauth.api.service.search.client.domain.documents.IndexUser.<init>(IndexUser.java:79)
This code makes an assumption that you have a username or email address. This should be enforced by the FusionAuth APIs. But in this case, for this exception to occur the email and username are both NULL.
How did you get users into the db, using the Import API, User API, or something else?
In theory you should find at least one user with a NULL value for the email and username.
This query - or similar - should find the offending user, then we need to identify how this user was added to FusionAuth.
SELECT email, username from identities WHERE email IS NULL OR username IS NULL

Related

Mongo Connection issue with error "state should be: open"

I am running an event in Akka actor system, where we run multiple actors to query mongo db and retrieve data. Each actor queries for 1000 documents (each document's size is 9kb)
When running an event that is required to fire 14 actors to query for Mongo DB to retrieve 13000 documents.Once I experienced below exception, not sure why? Have anyone experienced this before?
2020-04-14 19:17:28,818 [erp-writer-actor-system-akka.actor.default-dispatcher-378] ERROR c.a.s.c.m.GlobalContextMongoClientService- 76cd7a80-83ef-4389-885a-be9caed77449 - Exception occured while reading data from cursor
java.lang.IllegalStateException: state should be: open
at com.mongodb.assertions.Assertions.isTrue(Assertions.java:70)
at com.mongodb.connection.DefaultServer.getConnection(DefaultServer.java:84)
at com.mongodb.binding.ClusterBinding$ClusterBindingConnectionSource.getConnection(ClusterBinding.java:86)
at com.mongodb.operation.QueryBatchCursor.getMore(QueryBatchCursor.java:203)
at com.mongodb.operation.QueryBatchCursor.hasNext(QueryBatchCursor.java:103)
at com.mongodb.MongoBatchCursorAdapter.hasNext(MongoBatchCursorAdapter.java:46)
at com.xyz.smartconnect.commons.mongoclient.GlobalContextMongoClientService.findWorkers(GlobalContextMongoClientService.java:145)
at com.xyz.smartconnect.actors.QueryWorkersActor.lambda$createReceive$0(QueryWorkersActor.java:40)
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
at akka.actor.Actor$class.aroundReceive(Actor.scala:513)
at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:132)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:519)
at akka.actor.ActorCell.invoke(ActorCell.scala:488)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Suppressed: java.lang. IllegalStateException: state should be: open
at com.mongodb.assertions.Assertions.isTrue(Assertions.java:70)
at com.mongodb.connection.DefaultServer.getConnection(DefaultServer.java:84)
at com.mongodb.binding.ClusterBinding$ClusterBindingConnectionSource.getConnection(ClusterBinding.java:86)
at com.mongodb.operation.QueryBatchCursor.killCursor(QueryBatchCursor.java:261)
at com.mongodb.operation.QueryBatchCursor.close(QueryBatchCursor.java:147)
at com.mongodb.MongoBatchCursorAdapter.close(MongoBatchCursorAdapter.java:41)
at com.xyz.smartconnect.commons.mongoclient.GlobalContextMongoClientService.findWorkers(GlobalContextMongoClientService.java:149)
After running multiple tests and analyzing the logs carefully, I found the root cause. Below are the details.
While the application is using cursor to query data from mongoDb, connection has been released/closed. 'State should be : open' is complaining about a released connection.
In my case, my application experienced OutOfMemory, which caused disposing beans and releasing connections. Here is timeline of log events for this issue.
Since this is a memory issue for my case, fixing memory issue will fix below exception for me.
2020-04-19 12:57:32,981 [xyz-actor-system-akka.actor.default-dispatcher-72] ERROR a.a.ActorSystemImpl- - 413f9298-ca92-4744-913b-59934e4ce831 - exception on LARS’ timer thread
java.lang.OutOfMemoryError: GC overhead limit exceeded
at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:269)
at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:235)
at java.lang.Thread.run(Thread.java:748)
2020-04-19 12:57:43,649 [Thread-19] INFO o.s.c.s.DefaultLifecycleProcessor- - - Stopping beans in phase 2147483647
2020-04-19 12:58:13,483 [Thread-19] INFO o.s.j.e.a.AnnotationMBeanExporter- - - Unregistering JMX-exposed beans on shutdown
2020-04-19 12:58:45,186 [localhost-startStop-2] INFO c.a.s.ApplicationContextListener- - - >>>>>>>>> Disposing beans
2020-04-19 12:59:00,182 [localhost-startStop-2] INFO c.a.s.c.SpringBeanDisposer- - - Mongo connections are released.
2020-04-19 12:59:09,591 [xyz-actor-system-akka.actor.default-dispatcher-73] ERROR c.a.s.c.m.GlobalContextMongoClientService- - 413f9298-ca92-4744-913b-59934e4ce831 - Exception occured while reading data from cursor
java.lang.IllegalStateException: state should be: open
at com.mongodb.assertions.Assertions.isTrue(Assertions.java:70)
at com.mongodb.connection.DefaultServer.getDescription(DefaultServer.java:114)
at com.mongodb.binding.ClusterBinding$ClusterBindingConnectionSource.getServerDescription(ClusterBinding.java:81)
at com.mongodb.operation.QueryBatchCursor.initFromCommandResult(QueryBatchCursor.java:251)
at com.mongodb.operation.QueryBatchCursor.getMore(QueryBatchCursor.java:207)
at com.mongodb.operation.QueryBatchCursor.hasNext(QueryBatchCursor.java:103)
at com.mongodb.MongoBatchCursorAdapter.hasNext(MongoBatchCursorAdapter.java:46)

Kafka Stream Subscription Error - Invalid Version

When attempting to connect to a topic from Java jetty microservice, I’m getting this Kafka internal version mismatch error:
stream-thread [App-94d44dcd-f1d4-49a6-9dd3-8d4eee06f82a-StreamThread-1] Encountered the following error during processing:
java.lang.IllegalArgumentException: version must be between 1 and 3; was: 4
at org.apache.kafka.streams.processor.internals.assignment.SubscriptionInfo.<init>(SubscriptionInfo.java:67)
at org.apache.kafka.streams.processor.internals.StreamsPartitionAssignor.subscription(StreamsPartitionAssignor.java:312)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.metadata(ConsumerCoordinator.java:176)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.sendJoinGroupRequest(AbstractCoordinator.java:515)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.initiateJoinGroup(AbstractCoordinator.java:466)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:412)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:352)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:337)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:333)
at org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1218)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1175)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1154)
at org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:861)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:814)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:767)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:736)
Any ideas on what could cause such an exception?
I had come across this error myself and it is most likely because you have used non-unique APPLICATION_ID_CONFIG and/or CLIENT_ID_CONFIG
// Give the Streams application a unique name. The name must be unique in the Kafka cluster
// against which the application is run.
streamsConfiguration.put(StreamsConfig.APPLICATION_ID_CONFIG, "my-app");
streamsConfiguration.put(StreamsConfig.CLIENT_ID_CONFIG, "my-client");

Flink cluster deployed with "Fencing token not set exception"

What does mean this exception?
I am trying to deploy flink cluster(v.1.5.2) with 3 nodes in HA mode (zookeeper).
I have following flink-conf.yaml settings:
high-availability: zookeeper
high-availability.storageDir: /flink/ha
high-availability.zookeeper.quorum: {node1_ip}:2181,{node2_ip}:2181,{node3_ip}:2181
high-availability.jobmanager.port: 50010
high-availability.zookeeper.path.root: /flink
high-availability.zookeeper.path.namespace: /default_ns
Zookeeper cluster is running.
After start-cluster.sh executed I have only one working node. Another 2 nodes return
{"errors":["Could not retrieve the redirect address of the current leader. Please try to refresh."]} from web UI
and exception in flink-root-standalongsession-.log:
2018-08-07 18:55:22,081 ERROR org.apache.flink.runtime.rest.handler.legacy.files.StaticFileServerHandler - Could not retrieve the redirect address.
java.util.concurrent.CompletionException: org.apache.flink.runtime.rpc.exceptions.FencingTokenException: Fencing token not set: Ignoring message LocalFencedMessage(aec7a76447f8d44131605f5c10fb4fdc, LocalRpc
...
<------>at org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
<------>at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flink.runtime.rpc.exceptions.FencingTokenException: Fencing token not set: Ignoring message LocalFencedMessage(aec7a76447f8d44131605f5c10fb4fdc, LocalRpcInvocation(requestRestAddress(T
<------>at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:59)

Kinesis worker error: Caught exception when initializing LeaseCoordinator

I am trying to execute the sample producer and consumer code on the Kinesis Streams website: http://docs.aws.amazon.com/streams/latest/dev/learning-kinesis-module-one-download.html
I've downloaded the source, and I am using Eclipse to run it. I've included the necessary jar files, so I would think that everything would be setup to run.
When I run the processor code that consumes the records from Kinesis, however, I get this error:
Aug 02, 2016 8:35:14 PM com.amazonaws.services.kinesis.clientlibrary.lib.worker.Worker initialize SEVERE: Caught exception when initializing LeaseCoordinator
Does anyone think they could tell me what is causing this error?
EDIT: Here is the full stack trace from the error on Eclipse:
Aug 02, 2016 9:02:27 PM com.amazonaws.services.kinesis.clientlibrary.lib.worker.Worker initialize
SEVERE: Caught exception when initializing LeaseCoordinator
com.amazonaws.services.kinesis.leases.exceptions.DependencyException: com.amazonaws.services.dynamodbv2.model.AmazonDynamoDBException: User: arn:aws:sts::500238854089:assumed-role/NORD-NONPROD-a0121-Team/AEXM is not authorized to perform: dynamodb:CreateTable on resource: arn:aws:dynamodb:us-west-2:500238854089:table/amazon-kinesis-learning (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: AccessDeniedException; Request ID: BGME094FRUAEK2KFCPQIAM5U8VVV4KQNSO5AEMVJF66Q9ASUAAJG)
at com.amazonaws.services.kinesis.leases.impl.LeaseManager.createLeaseTableIfNotExists(LeaseManager.java:124)
at com.amazonaws.services.kinesis.clientlibrary.lib.worker.KinesisClientLibLeaseCoordinator.initialize(KinesisClientLibLeaseCoordinator.java:172)
at com.amazonaws.services.kinesis.clientlibrary.lib.worker.Worker.initialize(Worker.java:380)
at com.amazonaws.services.kinesis.clientlibrary.lib.worker.Worker.run(Worker.java:324)
at com.amazonaws.services.kinesis.samples.stocktrades.processor.StockTradesProcessor.main(StockTradesProcessor.java:96)
Caused by: com.amazonaws.services.dynamodbv2.model.AmazonDynamoDBException: User: arn:aws:sts::500238854089:assumed-role/NORD-NONPROD-a0121-Team/AEXM is not authorized to perform: dynamodb:CreateTable on resource: arn:aws:dynamodb:us-west-2:500238854089:table/amazon-kinesis-learning (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: AccessDeniedException; Request ID: BGME094FRUAEK2KFCPQIAM5U8VVV4KQNSO5AEMVJF66Q9ASUAAJG)
at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:1401)
at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:945)
at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:723)
at com.amazonaws.http.AmazonHttpClient.doExecute(AmazonHttpClient.java:475)
at com.amazonaws.http.AmazonHttpClient.executeWithTimer(AmazonHttpClient.java:437)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:386)
at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.doInvoke(AmazonDynamoDBClient.java:2074)
at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.invoke(AmazonDynamoDBClient.java:2044)
at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.createTable(AmazonDynamoDBClient.java:899)
at com.amazonaws.services.kinesis.leases.impl.LeaseManager.createLeaseTableIfNotExists(LeaseManager.java:117)
... 4 more
Your stack trace is telling you exactly what the problem is:
User: arn:aws:sts::500238854089:assumed-role/NORD-NONPROD-a0121-Team/AEXM is not authorized to perform: dynamodb:CreateTable on resource: arn:aws:dynamodb:us-west-2:500238854089:table/amazon-kinesis-learning
Make sure you've provided credentials to the DynamoDBClient that has CreateTable permissions - LeaseCoordinator attempts to create the leasing table in Dynamo.
It is actually possible to configure logging for Scala Kinesis Enrich by running the jar file like this:
java -jar -Dorg.slf4j.simpleLogger.defaultLogLevel=debug snowplow-kinesis-enrich-0.5.0 --config enrich.conf --resolver resolver.json
This should print all debug messages from the Kinesis Client Library. (Watch out because the output will become very verbose.) Could you try rerunning with this change to logging? Hopefully that will provide more clues about what's going wrong.

Java Quartz Ibatis Cron Issues

I have a java webapp using an ibatis row handler to load a very large dataset (1 million rows in an innodb table). The process is run as a nightly cron job by quartz scheduler. However, after it processes for 6 minutes, it dies with the following stack trace:
WARN [DefaultQuartzScheduler_Worker-8] MethodInvokingJobDetailFactoryBean$MethodInvokingJob.executeInternal(168) | Could not invoke method 'doBatch' on target object [org.myCron#4adb34]
org.springframework.jdbc.UncategorizedSQLException: SqlMapClient operation: encountered SQLException [
--- The error occurred in org/myCron/mySqlMap.xml.
--- The error occurred while applying a result map.
--- Check the mySqlMap.outputMapping.
--- The error happened while setting a property on the result object.
--- Cause: com.mysql.jdbc.CommunicationsException: Communications link failure due to underlying exception:
** BEGIN NESTED EXCEPTION **
java.io.EOFException
STACKTRACE:
java.io.EOFException
at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:1903)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2402)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2860)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:771)
at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1289)
at com.mysql.jdbc.RowDataDynamic.nextRecord(RowDataDynamic.java:362)
at com.mysql.jdbc.RowDataDynamic.next(RowDataDynamic.java:352)
at com.mysql.jdbc.ResultSet.next(ResultSet.java:6106)
at org.apache.commons.dbcp.DelegatingResultSet.next(DelegatingResultSet.java:168)
at sun.reflect.GeneratedMethodAccessor71.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:592)
at com.ibatis.common.jdbc.logging.ResultSetLogProxy.invoke(ResultSetLogProxy.java:47)
at $Proxy10.next(Unknown Source)
at com.ibatis.sqlmap.engine.execution.SqlExecutor.handleResults(SqlExecutor.java:380)
at com.ibatis.sqlmap.engine.execution.SqlExecutor.handleMultipleResults(SqlExecutor.java:301)
at com.ibatis.sqlmap.engine.execution.SqlExecutor.executeQuery(SqlExecutor.java:190)
at com.ibatis.sqlmap.engine.mapping.statement.GeneralStatement.sqlExecuteQuery(GeneralStatement.java:205)
at com.ibatis.sqlmap.engine.mapping.statement.GeneralStatement.executeQueryWithCallback(GeneralStatement.java:173)
at com.ibatis.sqlmap.engine.mapping.statement.GeneralStatement.executeQueryWithRowHandler(GeneralStatement.java:133)
at com.ibatis.sqlmap.engine.impl.SqlMapExecutorDelegate.queryWithRowHandler(SqlMapExecutorDelegate.java:649)
at com.ibatis.sqlmap.engine.impl.SqlMapSessionImpl.queryWithRowHandler(SqlMapSessionImpl.java:156)
at com.ibatis.sqlmap.engine.impl.SqlMapClientImpl.queryWithRowHandler(SqlMapClientImpl.java:133)
at org.springframework.orm.ibatis.SqlMapClientTemplate$5.doInSqlMapClient(SqlMapClientTemplate.java:267)
at org.springframework.orm.ibatis.SqlMapClientTemplate.execute(SqlMapClientTemplate.java:165)
at org.springframework.orm.ibatis.SqlMapClientTemplate.queryWithRowHandler(SqlMapClientTemplate.java:265)
at org.myCron.doBatch(MyCron.java:57)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:592)
at org.springframework.util.MethodInvoker.invoke(MethodInvoker.java:248)
at org.springframework.scheduling.quartz.MethodInvokingJobDetailFactoryBean$MethodInvokingJob.executeInternal(MethodInvokingJobDetailFactoryBean.java:165)
at org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:66)
at org.quartz.core.JobRunShell.run(JobRunShell.java:191)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:516)
** END NESTED EXCEPTION **
The stack trace is very vague. The only hints that I see are 'the error happened while setting a property on the result object'. There are only two properties on the result object: a String and an Integer. Both of them permit null values, but my select statements indicate that neither of them have any null values. They both have a proper gettter/setter (which makes sense since the process runs for a while successfully before dying). Every time that the cron runs, it dies at a random point (so it isn't stuck on a particular row).
Note - The method 'doBatch' does exist since that is the method that starts the cron process. If it couldn't find doBatch, it couldn't successfully process the first thousand rows.
I've also tried runnning the job outside of quartz and it also fails there as well. We tried increasing our MySQL net_read_timeout, net_write_timeout, and delayed_insert_timeout but none of these settings helped with the problem. I also tried setting my log4j setting to DEBUG and I did not get any helpful info.
Any other ideas about what I could try?
Sounds like MySQL closed the connection for some reason. Check the MySQL log see if anything shows up. Turn on various logging options for MySQL if necessary.
Also, start printing debug data (including timestamp) from your app - just print everything, then see what the last action was - perhaps you have some rarely triggered conditions in your code that has a bug.
I.e. every single time you talk to MySQL log it before AND after.

Categories