I have a consumer which reads data from a topic and spawns a thread for processing. At a single point of time there can be multiple messages being processed in the server. The application encountered DB timeouts and all the messages being processed were lost. And since there were multiple threads polling for DB connection, the application threw out of memory exception and went down.
How can I improve the architecture to remove data loss even if consumer goes down without processing
You should do At-Least-Once processing by committing the offsets after you complete your processing.
i.e Do
consumer.commitSync();
After your the thread completes successfully.
Note that you also need to configure the consumer to stop commmiting the offset automatically by setting ‘enable.auto.commit’ to false.
You need to be careful though that your consumer is Idempotent. i.e If it fails, and reads and processes the same value again, it will not effect the outcome.
You should commit the offset after getting a successful response from DB.
The issue is related to the available database connection and thread. The only way to handle this issue is to get a database connection and then send the database connection to the thread.
Thread Example
public class ConsumerThreadHandler implements Callable {
private ConsumerRecord consumerRecord;
private Connection dataBaseConnection;
public ConsumerThreadHandler(ConsumerRecord consumerRecord,) {
this.consumerRecord = consumerRecord;
this.dataBaseConnection = dataBaseConnection;
}
#Override
public Object call() throws Exception {
// Perform all the data base related things
// and generate the proper response
return;
}
}
Consumer Code
executor = new ThreadPoolExecutor(numberOfThreads, numberOfThreads, 0L, TimeUnit.MILLISECONDS,
new LinkedBlockingQueue<>(), new ThreadPoolExecutor.CallerRunsPolicy());
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (final ConsumerRecord record : records) {
// Get database connection , Check untill get the connection or maintain the connection pool and based on available connection move next.
Future future=executor.submit(new ConsumerThreadHandler(record,dataBaseConnection));
if(future.isDone())
// Based on the proper response commit the offset
}
}
}
You can go through the following simple example.
https://howtoprogram.xyz/2016/05/29/create-multi-threaded-apache-kafka-consumer/
Related
I am running a test system that spawns a Kinesis producer which starts writing messages, e.g.: 1 through 100 to a stream with two shards.
During that cycle a consumer starts to read the messages from the stream. I noticed that the consumer only reads the LATEST messages that come into the stream after it's running. So for example, it starts reading at message 43. I tried modifying the Worker.class to use the TRIM_HORIZON Policy but this doesn't seem to be working.
KinesisClientLibConfiguration c = new KinesisClientLibConfiguration("MediaPlan", "randeepstream",
DefaultAWSCredentialsProviderChain.getInstance(),
"consumer1")
.withInitialPositionInStream(InitialPositionInStream.TRIM_HORIZON);
final Worker w = new Worker.Builder()
.recordProcessorFactory(rpf)
.config(kinesisConfig)
.build();
new Thread(() -> w.run()).start();
My consumer's processor is setup as:
public class ConsumerRecordProcessorImpl implements IRecordProcessor {
public void initialize(InitializationInput initializationInput) {
log.info("Setting up consumer with shard {} starting at {}", initializationInput.getShardId(),
initializationInput.getExtendedSequenceNumber());
}
public void processRecords(ProcessRecordsInput processRecordsInput) {
...
}
}
I would expect to see a message like:
Setting up consumer with shard shardId-000000000000 starting at TRIM_HORIZON 0
but instead I get:
Setting up consumer with shard shardId-000000000000 starting at LATEST 0
How do I get my consumer to stop reading the latest and read all unprocessed messages?
Here is an example using amazon-kinesis-client lib v2.
You will have to use Schedular(software.amazon.kinesis.coordinator) which reads record in background and provide a retrieval config to this scheduler as follows
RetrievalConfig retrievalConfig = setRetrievalConfig();
Scheduler scheduler = new Scheduler(
configsBuilder.checkpointConfig(),
configsBuilder.coordinatorConfig(),
configsBuilder.leaseManagementConfig(),
configsBuilder.lifecycleConfig(),
configsBuilder.metricsConfig(),
configsBuilder.processorConfig(),
retrievalConfig);
private RetrievalConfig setRetrievalConfig(){
InitialPositionInStreamExtended initialPositionInStreamExtended = InitialPositionInStreamExtended.newInitialPosition(InitialPositionInStream.TRIM_HORIZON);
RetrievalConfig retrievalConfig = configsBuilder.retrievalConfig().retrievalSpecificConfig(new PollingConfig(streamName, kinesisClient));
retrievalConfig.initialPositionInStreamExtended(initialPositionInStreamExtended);
return retrievalConfig;
}
Notice the InitialPositionInStream.TRIM_HORIZON this will tell the scheduler to start consume records after last known position. So even if the consumer is down and producer still running, all the records produces during downtime of the consumer will be consumed.
NOTE: configBuilder is object of ConfigsBuilder (software.amazon.kinesis.common)
UPDATE: initialPositionInStream position won't updated unless you call the checkpoint() API after processing the data you received from kinesis.
So once you call the checkpinter() then latest Position of the stream processing record gets updated in DynamoDB and now KCL will process the record from this position.
I have a requirement where I read a bunch of rows (thousands) from a SQL DB using Spring Batch and call a REST Service to enrich content before writing them on a Kafka topic.
When using the Spring Reactive webClient, how do I limit the number of active non-blocking service calls? Should I somehow introduce a Flux in the loop after I read data using Spring Batch?
(I understand the usage of delayElements and that it serves a different purpose, as when a single Get Service Call brings in lot of data and you want the server to slow down -- here though, my use case is a bit different in that I have many WebClient calls to make and would like to limit the number of calls to avoid out of memory issues but still gain the advantages of non-blocking invocations).
Very interesting question. I pondered about it and I thought of a couple of ideas on how this could be done. I will share my thoughts on it and hopefully there are some ideas here that perhaps help you with your investigation.
Unfortunately, I'm not familiar with Spring Batch. However, this sounds like a problem of rate limiting, or the classical producer-consumer problem.
So, we have a producer that produces so many messages that our consumer cannot keep up, and the buffering in the middle becomes unbearable.
The problem I see is that your Spring Batch process, as you describe it, is not working as a stream or pipeline, but your reactive Web client is.
So, if we were able to read the data as a stream, then as records start getting into the pipeline those would get processed by the reactive web client and, using back-pressure, we could control the flow of the stream from producer/database side.
The Producer Side
So, the first thing I would change is how records get extracted from the database. We need to control how many records get read from the database at the time, either by paging our data retrieval or by controlling the fetch size and then, with back pressure, control how many of those are sent downstream through the reactive pipeline.
So, consider the following (rudimentary) database data retrieval, wrapped in a Flux.
Flux<String> getData(DataSource ds) {
return Flux.create(sink -> {
try {
Connection con = ds.getConnection();
con.setAutoCommit(false);
PreparedStatement stm = con.prepareStatement("SELECT order_number FROM orders WHERE order_date >= '2018-08-12'", ResultSet.TYPE_FORWARD_ONLY);
stm.setFetchSize(1000);
ResultSet rs = stm.executeQuery();
sink.onRequest(batchSize -> {
try {
for (int i = 0; i < batchSize; i++) {
if (!rs.next()) {
//no more data, close resources!
rs.close();
stm.close();
con.close();
sink.complete();
break;
}
sink.next(rs.getString(1));
}
} catch (SQLException e) {
//TODO: close resources here
sink.error(e);
}
});
}
catch (SQLException e) {
//TODO: close resources here
sink.error(e);
}
});
}
In the example above:
I control the amount of records we read per batch to be 1000 by setting a fetch size.
The sink will send the amount of records requested by the subscriber (i.e. batchSize) and then wait for it to request more using back pressure.
When there are no more records in the result set, then we complete the sink and close resources.
If an error occurs at any point, we send back the error and close resources.
Alternatively I could have used paging to read the data, probably simplifying the handling of resources by having to reissue a query at every request cycle.
You may consider also doing something if subscription is cancelled or disposed (sink.onCancel, sink.onDispose) since closing the connection and other resources is fundamental here.
The Consumer Side
At the consumer side you register a subscriber that only requests messages at a speed of 1000 at the time and it will only request more once it has processed that batch.
getData(source).subscribe(new BaseSubscriber<String>() {
private int messages = 0;
#Override
protected void hookOnSubscribe(Subscription subscription) {
subscription.request(1000);
}
#Override
protected void hookOnNext(String value) {
//make http request
System.out.println(value);
messages++;
if(messages % 1000 == 0) {
//when we're done with a batch
//then we're ready to request for more
upstream().request(1000);
}
}
});
In the example above, when subscription starts it requests the first batch of 1000 messages. In the onNext we process that first batch, making http requests using the Web client.
Once the batch is complete, then we request another batch of 1000 from the publisher, and so on and so on.
And there your have it! Using back pressure you control how many open HTTP requests you have at the time.
My example is very rudimentary and it will require some extra work to make it production ready, but I believe this hopefully offers some ideas that can be adapted to your Spring Batch scenario.
I have multi thread app which uses producer class to produce messages, earlier i was using below code to create producer for each request.where KafkaProducer was newly built with each request as below:
KafkaProducer<String, byte[]> producer = new KafkaProducer<String, byte[]>(prop);
ProducerRecord<String, byte[]> data = new ProducerRecord<String, byte[]>(topic, objBytes);
producer.send(data, new Callback() {
#Override
public void onCompletion(RecordMetadata metadata, Exception exception) {
if (exception != null) {
isValidMsg[0] = false;
exception.printStackTrace();
saveOrUpdateLog(msgBean, producerType, exception);
logger.error("ERROR:Unable to produce message.",exception);
}
}
});
producer.close();
Then I read Kafka docs on producer and come to know we should use single producer instance to have good performance.
Then I created single instance of KafkaProducer inside a singleton class.
Now when & where we should close the producer. Obviously if we close the producer after first send request it wont find the producer to resend messages hence throwing :
java.lang.IllegalStateException: Cannot send after the producer is closed.
OR how we can reconnect to producer once closed.
Problem is if program crashes or have exceptions then?
Generally, calling close() on the KafkaProducer is sufficient to make sure all inflight records have completed:
/**
* Close this producer. This method blocks until all previously sent requests complete.
* This method is equivalent to <code>close(Long.MAX_VALUE, TimeUnit.MILLISECONDS)</code>.
* <p>
* <strong>If close() is called from {#link Callback}, a warning message will be logged and close(0, TimeUnit.MILLISECONDS)
* will be called instead. We do this because the sender thread would otherwise try to join itself and
* block forever.</strong>
* <p>
*
* #throws InterruptException If the thread is interrupted while blocked
*/
If your producer is being used throughout the lifetime of your application, don't close it up until you get a termination signal, then call close(). As said in the documentation, the producer is safe to used in a multi-threaded environment and hence you should re-use the same instance.
If you're sharing your KafkaProducer in multiple threads, you have two choices:
Call close() while registering a shutdown callback via Runtime.getRuntime().addShutdownHook from your main execution thread
Have your multi-threaded methods race for closing on only allow for a single one to win.
A rough sketch of 2 would possibly look like this:
object KafkaOwner {
private var producer: KafkaProducer = ???
#volatile private var isClosed = false
def close(): Unit = {
if (!isClosed) {
kafkaProducer.close()
isClosed = true
}
}
def instance: KafkaProducer = {
this.synchronized {
if (!isClosed) producer
else {
producer = new KafkaProducer()
isClosed = false
}
}
}
}
As described in javadoc for KafkaProducer:
public void close()
Close this producer. This method blocks until all previously sent requests complete.
This method is equivalent to close(Long.MAX_VALUE, TimeUnit.MILLISECONDS).
src: https://kafka.apache.org/0110/javadoc/org/apache/kafka/clients/producer/KafkaProducer.html#close()
So you don't need to worry that your messages won't be sent, even if you call close immediately after send.
If you plan to use a KafkaProducer more than once, then close it only after you've finished using it. If you still want to have the guarantee that your message is actually sent before your method completes and not waiting in a buffer, then use KafkaProducer#flush() which will block until current buffer is sent. You can also block on Future#get() if you prefer.
There is also one caveat to be aware of if you don't plan to ever close your KafkaProducer (e.g. in short-lived apps, where you just send some data and the app immediately terminates after sending). The KafkaProducer IO thread is a daemon thread, which means the JVM will not wait until this thread finishes to terminate the VM. So, to ensure that your messages are actually sent use KafkaProducer#flush(), no-arg KafkaProducer#close() or block on Future#get().
Kafka producer is supposed to be thread safe and frugal with it's thread pool. you might want to use
producer.flush();
instead of
producer.close();
leaving the producer open until program termination or until your sure you won't need it any more.
If you still want to close the producer, then recreate it on demand.
producer = new KafkaProducer<String, byte[]>(prop);
After create multiple consumers (using Kafka 0.9 java API) and each thread started, I'm getting the following exception
Consumer has failed with exception: org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed due to group rebalance
class com.messagehub.consumer.Consumer is shutting down.
org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed due to group rebalance
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:546)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:487)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:681)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:654)
at org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167)
at org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
at org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:350)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:288)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:303)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:197)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:187)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:157)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:352)
at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:936)
at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:905)
and then start consuming message normally, I would like to know what is causing this exception in order to fix it.
Try also to tweak the following parameters:
heartbeat.interval.ms - This tells Kafka wait the specified amount of milliseconds before it consider the consumer will be considered "dead"
max.partition.fetch.bytes - This will limit the amount of messages (up to) the consumer will receive when polling.
I noticed that the rebalancing occurs if the consumer does not commit to Kafka before the heartbeat times out. If the commit occurs after the messages are processed, the amount of time to process them will determine these parameters. So, decreasing the number of messages and increasing the heartbeat time will help to avoid rebalancing.
Also consider to use more partitions, so there will be more threads processing your data, even with less messages per poll.
I wrote this small application to make tests. Hope it helps.
https://github.com/ajkret/kafka-sample
UPDATE
Kafka 0.10.x now offers a new parameter to control the number of messages received:
- max.poll.records - The maximum number of records returned in a single call to poll().
UPDATE
Kafka offers a way to pause the queue. While the queue is paused, you can process the messages in a separated Thread, allowing you to call KafkaConsumer.poll() to send heartbeats. Then call KafkaConsumer.resume() after the processing is done. This way you mitigate the problems of causing rebalances due to not sending heartbeats. Here is an outline of what you can do :
while(true) {
ConsumerRecords records = consumer.poll(Integer.MAX_VALUE);
consumer.commitSync();
consumer.pause();
for(ConsumerRecord record: records) {
Future<Boolean> future = workers.submit(() -> {
// Process
return true;
});
while (true) {
try {
if (future.get(1, TimeUnit.SECONDS) != null) {
break;
}
} catch (java.util.concurrent.TimeoutException e) {
getConsumer().poll(0);
}
}
}
consumer.resume();
}
Two possible reasons -->
If there are any network failures, consumers cannot reach out to the broker and will throw this exception. But there were no network failures when these exceptions occurred.
As mentioned in the error trace, if too much time is spent on processing the message, the ConsumerCoordinator will lose the connection and the commit will fail. This is because of polling.
The values given here are the default Kafka consumer configuration values.
request.timeout.ms=40000
heartbeat.interval.ms=3000
max.poll.interval.ms=300000
max.poll.records=500
session.timeout.ms=10000
Solution -->
Reduced the max.poll.records to 100 but still, the exception was occurring some times. So changed the configurations as below;
request.timeout.ms=300000
heartbeat.interval.ms=1000
max.poll.interval.ms=900000
max.poll.records=100
session.timeout.ms=600000
Reduced the heartbeat interval so that broker will be updated frequently that the Consumer is active. And also increased the session timeout configurations.
First, I'll explain the situation and the logic that I'm trying to implement:
I have multiple threads, each put result of it work, some object called Result into queue QueueToSend
My NettyClient runs in thread and takes Result from QueueToSend every 1 milisecond and should connect to server and send a message, that is created from Result.
I also need this connections to be asynch. So I need the Result list to be known by NettyHandler to send right message and process right result and then again send response.
So I initialize NettyClient bootstrap
bootstrap = new ClientBootstrap(
new NioClientSocketChannelFactory(
Executors.newCachedThreadPool(),
Executors.newCachedThreadPool()));
and sets pipeline once when app starts.
Then, every milisecond I take Result object from QueueToSend and connect to server
ChannelFuture future = bootstrap.connect(new InetSocketAddress(host,port);
ResultConcurrentHashMap.put(future.getChannel().getId(), result);
I decided to use static ConcurrentHashMap to save every result object taken from QueueToSend assosiated with channel.
The first problem takes place in NettyHandler in method channelConnected, when I am trying to take Result object assosiated with channel from ResultConcurrentHashMap.
#Override
public void channelConnected(ChannelHandlerContext ctx, ChannelStateEvent e) {
Channel channel = ctx.getPipeline.getChannel();
Result result = ResultConcurrentHashMap.get(channel.getId());
}
But sometimes result is null (1 of 50), even thought it should be in ResultConcurrentHashMap. I think it happens cause that channelConnected event happens before NettyClient runs this code:
ResultConcurrentHashMap.put(future.getChannel().getId(), result);
May be it will not appear if I run NettyServer and NettyClient not on localhost both, but remotely, it will take moretime to estabilish the connection. But I need a solution for this issue.
Another issue is that I am sending messages every 1 milisecond asynchromously and I suppose that messages are may be mixed and server can not read them properly. If I run them one by one it will be ok :
future.getChannel().getCloseFuture().awaitUninterruptibly();
But I need asynchromus sending, and process right results, assosiated with channel and send responses.
What should I implement?
ChannelFutures are executed asynchronously before the events get fired. For example channel connect future will be completed before firing the channel connected event.
So you have to register a channel future listener after calling bootstrap.connect() and write your code in the listener to initialize the HashMap, then it will be visible to the handler.
ChannelFuture channelFuture = bootstrap.connect(remoteAddress, localAddress);
channelFuture.addListener(new ChannelFutureListener() {
#Override
public void operationComplete(ChannelFuture future) throws Exception {
resultConcurrentHashMap.put(future.getChannel().getId(), result);
}
});