I am having a java web application with a scheduled background job which send kafka message.
I am using the default producer configuration and the job is very simple.
I am not assigning a producer id so each execution a new kafkaproducer is created with a new dynamic id.
When testing i noticed that always the first execution of my job (after my application is up) the job take around 2s but fewer ms in next executions.
I am not having any application cache and creating new instance of kafkaproducer each execution.
Any explanation to this please?is there any static cache in the kafka producer API?( i am always sending to the same topic and not specifying the partition in my producer record)
Related
We have a java/spring application which runs on EKS pods and we have records stored in MongoDB collection.
STATUS: READY,STARTED,COMPLETED
Application needs to pick the records which are in READY status and update the status to STARTED. Once the processing of the record is completed, the status will be updated to COMPLETED
Once the record is STARTED, it may take few hours to complete, until then other pods(other instance of the same app) should not pick this record. If some exception occurs, the app changes the status to READY so that other pods(or the same pod) can pick the READY record for processing.
Requirement: If the pod crashes when the record is processing(STARTED) but crashes before changing the status to READY/COMPLETED, the other pod should be able to pick this record and start processing again.
We have some solution in mind but trying to find the best solution. Request you to help me with some best approaches.
You can use a shutdown hook from spring:
#Component
public class Bean1 {
#PreDestroy
public void destroy() {
## handle database change
System.out.println(Status changed to ready);
}
}
Beyond that, that kind of job could run better in a messaging architecture, using SQS for example. Instead of using the status on the database to handle and orchestrate the task, you can use an SQS, publish the message that needs to be consumed (the messages that were in ready state) and have a poll of workers consuming messages from this SQS, if something crashes or the pod of this workers needs to be reclaimed, the message goes back to SQS and can be consumed by another pod.
Have a ArrayList containing 80 to 100 records trying to stream and send each individual record(POJO ,not entire list) to Kafka topic (event hub) . Scheduled a cron job like every hour to send these records(POJO) to event hub.
Able to see messages being sent to eventhub ,but after 3 to 4 successful run getting following exception (which includes several messages being sent and several failing with below exception)
Expiring 14 record(s) for eventhubname: 30125 ms has passed since batch creation plus linger time
Following is the config for Producer used,
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
props.put(ProducerConfig.ACKS_CONFIG, "1");
props.put(ProducerConfig.RETRIES_CONFIG, "3");
Message Retention period - 7
Partition - 6
using spring Kafka(2.2.3) to send the events
method marked as #Async where kafka send is written
#Async
protected void send() {
kafkatemplate.send(record);
}
Expected - No exception to be thrown from kafka
Actual - org.apache.kafka.common.errors.TimeoutException is been thrown
Prakash - we have seen a number of issues where spiky producer patterns see batch timeout.
The problem here is that the producer has two TCP connections that can go idle for > 4 mins - at that point, Azure load balancers close out the idle connections. The Kafka client is unaware that the connections have been closed so it attempts to send a batch on a dead connection, which times out, at which point retry kicks in.
Set connections.max.idle.ms to < 4mins – this allows Kafka client’s network client layer to gracefully handle connection close for the producer’s message-sending TCP connection
Set metadata.max.age.ms to < 4mins – this is effectively a keep-alive for the producer metadata TCP connection
Feel free to reach out to the EH product team on Github, we are fairly good about responding to issues - https://github.com/Azure/azure-event-hubs-for-kafka
This exception indicates you are queueing records at a faster rate than they can be sent. Once a record is added a batch, there is a time limit for sending that batch to ensure it has been sent within a specified duration. This is controlled by the Producer configuration parameter, request.timeout.ms. If the batch has been queued longer than the timeout limit, the exception will be thrown. Records in that batch will be removed from the send queue.
Please check the below for similar issue, this might help better.
Kafka producer TimeoutException: Expiring 1 record(s)
you can also check this link
when-does-the-apache-kafka-client-throw-a-batch-expired-exception/34794261#34794261 for reason more details about batch expired exception.
Also implement proper retry policy.
Note this does not account any network issues scanner side. With network issues you will not be able to send to either hub.
Hope it helps.
I am running a test system that spawns a Kinesis producer which starts writing messages, e.g.: 1 through 100 to a stream with two shards.
During that cycle a consumer starts to read the messages from the stream. I noticed that the consumer only reads the LATEST messages that come into the stream after it's running. So for example, it starts reading at message 43. I tried modifying the Worker.class to use the TRIM_HORIZON Policy but this doesn't seem to be working.
KinesisClientLibConfiguration c = new KinesisClientLibConfiguration("MediaPlan", "randeepstream",
DefaultAWSCredentialsProviderChain.getInstance(),
"consumer1")
.withInitialPositionInStream(InitialPositionInStream.TRIM_HORIZON);
final Worker w = new Worker.Builder()
.recordProcessorFactory(rpf)
.config(kinesisConfig)
.build();
new Thread(() -> w.run()).start();
My consumer's processor is setup as:
public class ConsumerRecordProcessorImpl implements IRecordProcessor {
public void initialize(InitializationInput initializationInput) {
log.info("Setting up consumer with shard {} starting at {}", initializationInput.getShardId(),
initializationInput.getExtendedSequenceNumber());
}
public void processRecords(ProcessRecordsInput processRecordsInput) {
...
}
}
I would expect to see a message like:
Setting up consumer with shard shardId-000000000000 starting at TRIM_HORIZON 0
but instead I get:
Setting up consumer with shard shardId-000000000000 starting at LATEST 0
How do I get my consumer to stop reading the latest and read all unprocessed messages?
Here is an example using amazon-kinesis-client lib v2.
You will have to use Schedular(software.amazon.kinesis.coordinator) which reads record in background and provide a retrieval config to this scheduler as follows
RetrievalConfig retrievalConfig = setRetrievalConfig();
Scheduler scheduler = new Scheduler(
configsBuilder.checkpointConfig(),
configsBuilder.coordinatorConfig(),
configsBuilder.leaseManagementConfig(),
configsBuilder.lifecycleConfig(),
configsBuilder.metricsConfig(),
configsBuilder.processorConfig(),
retrievalConfig);
private RetrievalConfig setRetrievalConfig(){
InitialPositionInStreamExtended initialPositionInStreamExtended = InitialPositionInStreamExtended.newInitialPosition(InitialPositionInStream.TRIM_HORIZON);
RetrievalConfig retrievalConfig = configsBuilder.retrievalConfig().retrievalSpecificConfig(new PollingConfig(streamName, kinesisClient));
retrievalConfig.initialPositionInStreamExtended(initialPositionInStreamExtended);
return retrievalConfig;
}
Notice the InitialPositionInStream.TRIM_HORIZON this will tell the scheduler to start consume records after last known position. So even if the consumer is down and producer still running, all the records produces during downtime of the consumer will be consumed.
NOTE: configBuilder is object of ConfigsBuilder (software.amazon.kinesis.common)
UPDATE: initialPositionInStream position won't updated unless you call the checkpoint() API after processing the data you received from kinesis.
So once you call the checkpinter() then latest Position of the stream processing record gets updated in DynamoDB and now KCL will process the record from this position.
Given the following scenario:
I bring up zookeeper and a single kafka broker on my local and create "test" topic as described in the kafka quickstart: https://kafka.apache.org/quickstart
Then, I run a simple java program that produces a message to the "test" topic every second. After some time I bring down my local kafka broker and see producer continues producing messages, it doesn't throw any exception. Finally, I bring kafka broker up again, producer is able to reconnect to broker and it continues producing messages, but, all those messages that were produced during kafka broker downtime are lost. Producer doesn't replay them when detects healthy kafka broker.
How can I prevent this? I want kafka producer to replay those messages when it detects kafka broker back online. Here is my producer config:
props.put("bootstrap.servers", "localhost:9092");
props.put("acks", "all");
props.put("linger.ms", 0);
props.put("key.serializer", StringSerializer.class.getName());
props.put("value.serializer", StringSerializer.class.getName());
Kafka Producer library has a retry mechanism built in, however it is turned off by default. Change a retries Producer config to a value bigger that 0 (default value) to turn it on. You should also experiment with retry.backoff.ms and request.timetout.ms in order to customise Producer retries.
Example Kafka Producer config with enabled retries:
retries=2147483647 //Integer.MAX_VALUE
retry.backoff.ms=1000
request.timeout.ms=305000 //5 minutes
max.block.ms=2147483647 //Integer.MAX_VALUE
You can find more information about those properties in Apache Kafka documentation.
Since you're running just one broker, I'm afraid you won't be able to store messages when your broker is down.
However, it is strange that you don't get any exception/warning/errors when you bring your broker down.
I would expect a "Failed to update metadata" or "expiring messages" error because when the producer sends messages to the broker(s) mentioned against the bootstrap.servers property, it first checks with the zookeeper for the active controller (or leader) and partitions. So, in your case since you're running kafka in a stand-alone mode and when the broker is down the producer should not receive the leader information and error out.
Could you please check what the following properties are set to:
request.timeout.ms
max.block.ms
and play around (reducing, may be) with these values? and check the results?
One more option you might want to try out is to send messages to Kafka in a synchronous fashion (blocking send() method until the messages are received) and here's a code snippet that might help (taken from this documentation reference):
If you want to simulate a simple blocking call you can call the get() method immediately:
byte[] key = "key".getBytes();
byte[] value = "value".getBytes();
ProducerRecord<byte[],byte[]> record = new ProducerRecord<byte[],byte[]>("my-topic", key, value)
producer.send(record).get();
In this case, kafka should throw an exception if the messages are not sent successfully for any reason.
I hope this helps.
We have a storm topology in which we configured one spout and two bolts. Spout queries data from DB continuously and send tuples it to first bolt for some processing. First bolt does some processing and send tuples it to second bolt which calls third party web service and sends data. So, what is happening after some time, last bolt is not getting any tuples and if we restart the topology it works fine. Only last bolt is in problem here. Other spout and first bolt are running fine, and I am not using acking framework. I have configured only one worker in this case`.
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("messageListenrSpout", new MessageListenerSpout(), 1);
builder.setBolt("processorBolt", new ProcessorBolt(), 20).shuffleGrouping("messageListenrSpout");
builder.setBolt("notifierBolt", new NotifierBolt(),40).shuffleGrouping("processorBolt");
Config conf = new Config();
conf.put(Config.TOPOLOGY_SLEEP_SPOUT_WAIT_STRATEGY_TIME_MS, 10000);
//conf.setMessageTimeoutSecs(600);
conf.setDebug(true);
StormSubmitter.submitTopology(TOPOLOGY, conf, builder.createTopology());
It's quite likely that you're having problems with a backlog of tuples causing timeouts. Try increasing the parallelism hint for the 2nd bolt since it sounds like that one's process time is much longer than that of the first bolt (that's why there would be a backlog into the 2nd bolt). If you're running this topology on the cluster look at the Storm UI to see the specifics.
Guys when I was debugging my topology, I found that if let's say spout is sending message fast but bolt is processing slow. In this case, message will be queued up LMAX Disruptor Queue. Then spout task wait for that to be empty. If you take thread dump, you will find threads are in TIMED_WAITING state. So, we need to configure topology in such a way that its inflow and outflow maintained.