Debezium Interrupted while emitting initial DROP TABLE events

Debezium Interrupted while emitting initial DROP TABLE events - java

I'm trying to set up Debezium engine with MariaDB and ActiveMQ. I'm using Quarkus framework. I'm following the official documentation (https://debezium.io/documentation/reference/development/engine.html). When I start the engine I get the following error:
2021-05-03 10:05:53,184 INFO [io.deb.pip.sou.AbstractSnapshotChangeEventSource] (debezium-mysqlconnector-my-app-connector-change-event-source-coordinator) Snapshot - Final stage
2021-05-03 10:05:53,184 WARN [io.deb.pip.ChangeEventSourceCoordinator] (debezium-mysqlconnector-my-app-connector-change-event-source-coordinator) Change event source executor was interrupted: java.lang.InterruptedException: Interrupted while emitting initial DROP TABLE events
Not really sure why this happens and so far I've not been able to track down the source of the problem so any kind of help will be appreciated.

I was able to resolve this by deleting the file configured with the property offset.storage.file.filename.

// Run the engine asynchronously ...
ExecutorService executor = Executors.newSingleThreadExecutor();
executor.execute(engine);
// Do something else or wait for a signal or an event
Make sure DO wait something, or the connector thread will be terminated by the main thread, and you will get a message like "Snapshot was interrupted before completion".

Related

Locking Mechanism if pod crashes while processing mongodb record

We have a java/spring application which runs on EKS pods and we have records stored in MongoDB collection.
STATUS: READY,STARTED,COMPLETED
Application needs to pick the records which are in READY status and update the status to STARTED. Once the processing of the record is completed, the status will be updated to COMPLETED
Once the record is STARTED, it may take few hours to complete, until then other pods(other instance of the same app) should not pick this record. If some exception occurs, the app changes the status to READY so that other pods(or the same pod) can pick the READY record for processing.
Requirement: If the pod crashes when the record is processing(STARTED) but crashes before changing the status to READY/COMPLETED, the other pod should be able to pick this record and start processing again.
We have some solution in mind but trying to find the best solution. Request you to help me with some best approaches.

You can use a shutdown hook from spring:
#Component
public class Bean1 {
#PreDestroy
public void destroy() {
## handle database change
System.out.println(Status changed to ready);
}
}
Beyond that, that kind of job could run better in a messaging architecture, using SQS for example. Instead of using the status on the database to handle and orchestrate the task, you can use an SQS, publish the message that needs to be consumed (the messages that were in ready state) and have a poll of workers consuming messages from this SQS, if something crashes or the pod of this workers needs to be reclaimed, the message goes back to SQS and can be consumed by another pod.

Draining a job with processing_time timer

I'm working with a dataflow job with stateful processing and timer. The processus is simplified as below :
Receiving messages from PubSub Subscription.
Keeping documents into bagState.
Checking with a loop timer (processing_time) if all conditions are met
if ok, clear bagState and generate a new message to next step.
Convert and send message to PubSub Topic.
Don't have set particular windowing policies, so I'm using GlobalWindow (as I understood).
When draining is performed (with continuously incoming message - 1k/sec - don't know if it could be related), the job raise this exception :
Error message from worker: java.lang.IllegalArgumentException: Attempted to set a processing-time timer with an output timestamp of 294247-01-10T04:00:54.775Z that is after the expiration of window 294247-01-09T04:00:54.775Z
with related stacktrace :
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:440)
org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.SimpleDoFnRunner$TimerInternalsTimer.setAndVerifyOutputTimestamp(SimpleDoFnRunner.java:1229)
org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.SimpleDoFnRunner$TimerInternalsTimer.setRelative(SimpleDoFnRunner.java:1138)
xxx.xxxxx.xxxxxxxxx.transform.AggregateLogsAuditFn.onLoopIteration(AggregateLogsAuditFn.java:292)
This error handles when resetting loop timer in #ProcessElements method (or whatever in #OnTimer) :
loopTimer.offset(Duration.standardSeconds(loopTimerSec.get())).setRelative();
The timer (and it's value) are declared as :
#TimerId("loopTimer") private final TimerSpec loopTimer = TimerSpecs.timer(TimeDomain.PROCESSING_TIME);
private ValueProvider<Integer> latenessTimerSec;
As an exception is raised, the job won't stop properly ; and we need to cancel it.
Please note that updating the job (with --update) is working fine, and this exception never appears when the job is running normally.
Thanks for your advices, Lionel.

Draining advances the watermark to "the end of time" (aka 294247-01-10) and the error seems to indicate that a timer is being set to shortly after this. To make th loop timer compatible with drain you should avoid setting it if the current time is this high.

How to recover client from "No handler waiting for message" warning?

At medium to high load (test and production), when using the Vert.x Redis client, I get the following warning after a few hundred requests.
2019-11-22 11:30:02.320 [vert.x-eventloop-thread-1] WARN io.vertx.redis.client.impl.RedisClient - No handler waiting for message: [null, 400992, <data from redis>]
As a result, the handler supplied to the Redis call (see below) does not get called and the incoming request times out.
Handler<AsyncResult<String>> handler = res -> {
// success handler
};
redis.get(key, res -> {
handler.handle(res);
});
The real issue is that once the "No handler ..." warning comes up, the Redis client becomes useless because all further calls to Redis made via the client fails with the same warning resulting in the handler not getting called. I have an exception handler set on the client to attempt reconnection, but I do not see any reconnections being attempted.
How can one recover from this problem? Any workarounds to alleviate the severity would also be great.
I'm on vertx-core and vertx-redis-client 3.8.1 .

The upcoming 4.0 release had addressed this issue and a release should be hapening soon, how soon, I can't really tell.
The problem is that we can't easily port back from the master branch to the 3.8 branch because a major refactoring has happened on the client and the codebases are very different.
The new code, uses a connection pool and has been tested for concurrent access (and this is where the issue you're seeing comes from). Under load the requests are routed across all event loops and the queue that maintains the state between in flight requests (requests sent to redis) and waiting handlers would get out of sync in very special conditions.
So I'd first try to see if you can already start moving your code to 4.0, you can have a try with the 4.0.0-milestone3 version but to be totally fine, just have a run with the latest master which has more issues solved in this area.

Can gRPC server reclaim the DEADLINE_EXCEEDED thread immediately?

Is it right to say that - java gRPC server thread will still run even after the DEADLINE time. But, gRPC server will stop/block that thread only from making any subsequent gRPC calls since the DEADLINE time has crossed?
If the above is a correct statement, then is there a way to stop / block the thread making any Redis / DB calls as well for which DEADLINE time has crossed ? Or once the DEADLINE time is crossed, interrupt the thread immedietly?

Is it right to say that - java gRPC server thread will still run even after the DEADLINE time.
Correct. Java doesn't offer any real alternatives.
But, gRPC server will stop/block that thread only from making any subsequent gRPC calls since the DEADLINE time has crossed?
Mostly. Outgoing gRPC calls observe the io.grpc.Context, which means deadlines and cancellations are propagated (unless you fail to propagate Context to another thread or use Context.fork()).
If the above is a correct statement, then is there a way to stop / block the thread making any Redis / DB calls as well for which DEADLINE time has crossed ? Or once the DEADLINE time is crossed, interrupt the thread immedietly?
You can listen for the Context cancellation via Context.addListener(). The gRPC server will cancel the Context when the deadline expires and if the client cancels the RPC. This notification is how outgoing RPCs are cancelled.
I will note that thread interruption is a bit involved to perform without racing. If you want interruption and don't have a Future already, I suggest wrapping your work in a FutureTask (and simply calling FutureTask.run() on the current thread) in order to get its non-racy cancel(true) implementation.
final FutureTask<Void> future = new FutureTask<Void>(work, null);
Context current = Context.current();
CancellationListener listener = new CancellationListener() {
#Override public void cancelled(Context context) {
future.cancel(true);
}
};
current.addListener(listener);
future.run();
current.removeListener(listener);

You can check Context.isCancelled() before making Redis / DB queries, and throw StatusException(CANCELLED) if it has.

Quartz job completed but the thread remains blocked

I have a Quartz Job like this
#PersistJobDataAfterExecution
#DisallowConcurrentExecution
public class MyJob{
public void execute(JobExecutionContext jec) throws JobExecutionException {
//connect to a FTP server, monitor directory for new files and download
//Using FTPClient of commons-net-3.5.jar
}
The job is triggered with
JobDetail jobDetail = newJob(MyJob.class)
.withIdentity(jobName, DEFAULT_GROUP)
.usingJobData(new JobDataMap(jobProperties))
.build();
//trigger every minute
Trigger trigger = newTrigger()
.withIdentity(jobName, DEFAULT_GROUP)
.startNow()
.withSchedule(cronSchedule(cronExpression))
.build();
scheduler.scheduleJob(jobDetail,trigger);
The job is triggered every minute. It works well for about 1 week (10000 Executions) and inexplicably not relaunches. There are no errors in the log and see that it has completed the previous execution. The other processes are firing correctly.
Upgrading libraries to quartz-2.2.3 and commons-net-3.5 (looking for a possible bug in the FTP library) I managed to last 3 weeks
I have a Job to monitor Scheduler that says trigger state is BLOCKED. The Thread of the blocked process is not reused by application server
TriggerState triggerState = scheduler.getTriggerState(triggerKey);
I have not found documentation on this type of problem with Quartz, so my suspicion is a bug in the FTP library that interferes with the thread started by quartz for example with the usage of #PersistJobDataAfterExecution
I wonder if it's a known issue or could be a bug so I could apply a solution or a workaround ( killing the quartz job how to stop/interrupt quartz scheduler job manually)

After months with occasional drops of service and suspect that FTP connectivity errors block the service, we have finally implemented a measure that seems to solve the problem
Each process executions do now:
FTPClient ftp = new FTPClient();
//Added connection timeout before connect()
ftp.setDefaultTimeout(getTimeoutInMilliseconds());
ftp.connect(host, port);
//Added more timeouts to see if thread locks disappear...
ftp.setBufferSize(1024 * 1024);
ftp.setSoTimeout(getTimeoutInMilliseconds());
The weird thing is that the process was not blocked previously in connect(), the process continued and ended without restarting, but when setting the timeout the problem has not happened again

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Debezium Interrupted while emitting initial DROP TABLE events - java

I was able to resolve this by deleting the file configured with the property offset.storage.file.filename.

Related

Locking Mechanism if pod crashes while processing mongodb record

Draining a job with processing_time timer

How to recover client from "No handler waiting for message" warning?

Can gRPC server reclaim the DEADLINE_EXCEEDED thread immediately?

Quartz job completed but the thread remains blocked

Categories

Resources