We have written a java client for publishing message to kafka. The code is as shown below
Properties props = new Properties();
props.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "202.xx.xx.xxx:9092");
props.setProperty(ProducerConfig.METADATA_FETCH_TIMEOUT_CONFIG,Integer.toString(5 * 1000));
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,StringSerializer.class.getName());
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,StringSerializer.class.getName());
//1. create KafkaProducer
KafkaProducer producer = new KafkaProducer(props);
//2 create callback
Callback callback = new Callback() {
public void onCompletion(RecordMetadata metadata, Exception e) {
System.out.println("Error while sending data");
if (e != null);
e.printStackTrace();
}
};
producer.send(record, callback);
When we execute this code , we get the following message and exception
ProducerConfig values:
compression.type = none
metric.reporters = []
metadata.max.age.ms = 300000
metadata.fetch.timeout.ms = 5000
acks = 1
batch.size = 16384
reconnect.backoff.ms = 10
bootstrap.servers = [202.xx.xx.xx:9092]
receive.buffer.bytes = 32768
retry.backoff.ms = 100
buffer.memory = 33554432
timeout.ms = 30000
key.serializer = class org.apache.kafka.common.serialization.StringSerializer
retries = 0
max.request.size = 1048576
block.on.buffer.full = true
value.serializer = class org.apache.kafka.common.serialization.StringSerializer
metrics.sample.window.ms = 30000
send.buffer.bytes = 131072
max.in.flight.requests.per.connection = 5
metrics.num.samples = 2
linger.ms = 0
client.id =
Updated cluster metadata version 1 to Cluster(nodes = [Node(202.xx.xx.xx, 9092)], partitions = [])
Starting Kafka producer I/O thread.
The configuration metadata.broker.list = null was supplied but isn't a known config.
The configuration request.required.acks = null was supplied but isn't a known config.
Kafka producer started
Trying to send metadata request to node -1
Init connection to node -1 for sending metadata request in the next iteration
Initiating connection to node -1 at 202.xx.xx.xx:9092.
Trying to send metadata request to node -1
Completed connection to node -1
Trying to send metadata request to node -1
Sending metadata request ClientRequest(expectResponse=true, payload=null, request=RequestSend(header= {api_key=3,api_version=0,correlation_id=0,client_id=producer-1}, body={topics=[HelloWorld]})) to node -1
Updated cluster metadata version 2 to Cluster(nodes = [Node(0, 192.local, 9092)], partitions = [Partition(topic = HelloWorld, partition = 0, leader = 0, replicas = [0,], isr = [0,]])
Initiating connection to node 0 at 192.local:9092.
0 max latency = 219 ms, avg latency = 0.00022
1 records sent in 219 ms ms. 4.57 records per second (0.00 mb/sec).Error connecting to node 0 at 192.local:9092:
java.io.IOException: Can't resolve address: 192.local:9092
at org.apache.kafka.common.network.Selector.connect(Selector.java:138)
at org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:417)
at org.apache.kafka.clients.NetworkClient.ready(NetworkClient.java:116)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:165)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:122)
at java.lang.Thread.run(Unknown Source)
Caused by: java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Unknown Source)
at sun.nio.ch.SocketChannelImpl.connect(Unknown Source)
at org.apache.kafka.common.network.Selector.connect(Selector.java:135)
... 5 more
Beginning shutdown of Kafka producer I/O thread, sending remaining records.
Initiating connection to node 0 at 192.local:9092.
Error connecting to node 0 at 192.local:9092:
java.io.IOException: Can't resolve address: 192.local:9092
at org.apache.kafka.common.network.Selector.connect(Selector.java:138)
at org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:417)
at org.apache.kafka.clients.NetworkClient.ready(NetworkClient.java:116)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:165)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:135)
at java.lang.Thread.run(Unknown Source)
Caused by: java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Unknown Source)
at sun.nio.ch.SocketChannelImpl.connect(Unknown Source)
at org.apache.kafka.common.network.Selector.connect(Selector.java:135)
... 5 more
Give up sending metadata request since no node is available
This happens in a infinite loop and the application hangs... When we checked the kafka broker , we found that the topic was created... but we did not get the message... We have been stuck on this for a while... Please help
We finally figured out the issue... We were running kafka in a hybrid evironment as mentioned in the following post -
https://medium.com/#thedude_rog/running-kafka-in-a-hybrid-cloud-environment-17a8f3cfc284
We changed the host.name to the internal IP and advertised.host.name to external IP
Related
I am developing a streaming application with Quarkus. My application looks as follows.
flatMap to change key and generate multiple messages from a single message.
join with a KTable using key from Step 1.
transform for a stateful operation.
flatMap change the key back to original key i.e., before Step 1.
groupBy with the key as in Step 4, which is actually that in, Step 1.
reduce to "merge" the records into a single message comprising a JSON array.
The net effect is to split an incoming message (with key as id1) into multiple messages (with different keys e.g., k1, k2, etc.). Enhance each of the message using join and transform. Then, change the key of each message back to id1. Finally, "merge" each of the enhanced message into a single message with key as id1.
I keep getting an error to set-up default key serde and value serde. While the default serde can be set in application.properties, I am not clear, why does this error even arise?
Note that, if I do not do Step 5 and Step 6, the application works successfully.
This is the Java exception I get.
2022-10-17 16:42:34,884 ERROR [org.apa.kaf.str.KafkaStreams] (app-alerts-6a7c4df8-7813-4d5d-9a86-d6f3db7c8ef0-StreamThread-1) stream-client [app-alerts-6a7c4df8-7813-4d5d-9a86-d6f3db7c8ef0] Encountered the following exception during processing and the registered exception handler opted to SHUTDOWN_CLIENT. The streams client is going to shut down now. : org.apache.kafka.streams.errors.StreamsException: org.apache.kafka.common.config.ConfigException: Please specify a key serde or set one through StreamsConfig#DEFAULT_KEY_SERDE_CLASS_CONFIG
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:627)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:551)
Caused by: org.apache.kafka.common.config.ConfigException: Please specify a key serde or set one through StreamsConfig#DEFAULT_KEY_SERDE_CLASS_CONFIG
at org.apache.kafka.streams.StreamsConfig.defaultKeySerde(StreamsConfig.java:1587)
at org.apache.kafka.streams.processor.internals.AbstractProcessorContext.keySerde(AbstractProcessorContext.java:90)
at org.apache.kafka.streams.processor.internals.SerdeGetter.keySerde(SerdeGetter.java:47)
at org.apache.kafka.streams.kstream.internals.WrappingNullableUtils.prepareSerde(WrappingNullableUtils.java:63)
at org.apache.kafka.streams.kstream.internals.WrappingNullableUtils.prepareKeySerde(WrappingNullableUtils.java:90)
at org.apache.kafka.streams.state.internals.MeteredKeyValueStore.initStoreSerde(MeteredKeyValueStore.java:195)
at org.apache.kafka.streams.state.internals.MeteredKeyValueStore.init(MeteredKeyValueStore.java:144)
at org.apache.kafka.streams.processor.internals.ProcessorStateManager.registerStateStores(ProcessorStateManager.java:212)
at org.apache.kafka.streams.processor.internals.StateManagerUtil.registerStateStores(StateManagerUtil.java:97)
at org.apache.kafka.streams.processor.internals.StreamTask.initializeIfNeeded(StreamTask.java:231)
at org.apache.kafka.streams.processor.internals.TaskManager.tryToCompleteRestoration(TaskManager.java:454)
at org.apache.kafka.streams.processor.internals.StreamThread.initializeAndRestorePhase(StreamThread.java:865)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:747)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:589)
... 1 more
These are StreamsConfig values:
acceptable.recovery.lag = 10000
application.id = machine-alerts
application.server =
bootstrap.servers = [kafka:9092]
buffered.records.per.partition = 1000
built.in.metrics.version = latest
cache.max.bytes.buffering = 10240
client.id =
commit.interval.ms = 1000
connections.max.idle.ms = 540000
default.deserialization.exception.handler = class org.apache.kafka.streams.errors.LogAndFailExceptionHandler
default.dsl.store = rocksDB
default.key.serde = null
default.list.key.serde.inner = null
default.list.key.serde.type = null
default.list.value.serde.inner = null
default.list.value.serde.type = null
default.production.exception.handler = class org.apache.kafka.streams.errors.DefaultProductionExceptionHandler
default.timestamp.extractor = class org.apache.kafka.streams.processor.FailOnInvalidTimestamp
default.value.serde = null
max.task.idle.ms = 0
max.warmup.replicas = 2
metadata.max.age.ms = 500
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = DEBUG
metrics.sample.window.ms = 30000
num.standby.replicas = 0
num.stream.threads = 1
poll.ms = 100
probing.rebalance.interval.ms = 600000
processing.guarantee = at_least_once
rack.aware.assignment.tags = []
receive.buffer.bytes = 32768
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
repartition.purge.interval.ms = 30000
replication.factor = -1
request.timeout.ms = 40000
retries = 0
retry.backoff.ms = 100
rocksdb.config.setter = null
security.protocol = PLAINTEXT
send.buffer.bytes = 131072
state.cleanup.delay.ms = 600000
state.dir = /tmp/kafka-streams
task.timeout.ms = 300000
topology.optimization = none
upgrade.from = null
window.size.ms = null
windowed.inner.class.serde = null
windowstore.changelog.additional.retention.ms = 86400000
I don't know how Quarkus works, However, the Serde error in groupBy statements is common when you start using Kafka Streams. The groupBy statement creates an internal and compacted topic where the data is going to be grouped by key, (Internally your stream application is going to send messages to that topic) for that reason you should specify the Serde for your GroupBy statement at the code level if the value or key are different types as default key and value Serde in your Stream properties.
KGroupedStream<String, User> groupedStream = stream.groupByKey(
Serialized.with(
Serdes.String(), /* key */
new CustomUserSerde()) /* value */
);
In the above example, I'm using the desired Serdes in the group statement, not at the property level.
I using the following method in order to truncate data from aerospike namespace.set.bins:
// Setting LUT
val calendar = Calendar.getInstance()
calendar.setTimeInMillis(startTime + 1262304000000L) // uses CITRUSLEAF_EPOCH - see https://discuss.aerospike.com/t/how-to-use-view-and-calulate-last-update-time-lut-for-the-truncate-command/4330
logger.info(s"truncate($startTime = ${calendar.getTime}, durableDelete = $durableDelete) on ${config.toRecoverMap}")
// Define Scan and Write Policies
val writePolicy = new WritePolicy()
val scanPolicy = new ScanPolicy()
writePolicy.durableDelete = durableDelete
scanPolicy.filterExp = Exp.build(Exp.le(Exp.lastUpdate(), Exp.`val`(calendar)))
// Scan all records such as LUT <= startTime
config.toRecoverMap.flatMap { case (namespace, mapOfSetsToBins) =>
for ((set, bins) <- mapOfSetsToBins) yield {
val recordCount = new AtomicInteger(0)
client.scanAll(scanPolicy, namespace, set, new ScanCallback() {
override def scanCallback(key: Key, record: Record): Unit = {
val requiresNullify = bins.filter(record.bins.containsKey(_)).distinct // Instead of making bulk requests which maybe not be needed and load AS
if (requiresNullify.nonEmpty) {
client.put(writePolicy, key, requiresNullify.map(Bin.asNull): _*)
logger.debug(s"${recordCount.incrementAndGet()}: (${requiresNullify.mkString(",")}) Bins of Record: $record with $key are set to NULL")
}
}
})
logger.info(s"Totally $recordCount records affected during the truncate operation on $namespace.$set.$bins")
recordCount.get
}
}
}
This is failed on:
...
2021-08-08 16:51:30,551 [Aerospike-6] DEBUG c.d.a.c.r.services.AerospikeService.scanCallback(55) - 33950: (IsActive) Bins of Record: (gen:3),(exp:0),(bins:(IsActive:0)) with test-recovery-set-multi-1:null:95001b26e70dbb35e1487802ebbc857eceb92246 are set to NULL
for reason:
Error -11,6,0,30000,0,5: Max retries exceeded: 5
com.aerospike.client.AerospikeException: Error -11,6,0,30000,0,5: Max retries exceeded: 5
at com.aerospike.client.query.PartitionTracker.isComplete(PartitionTracker.java:282)
at com.aerospike.client.command.ScanExecutor.scanPartitions(ScanExecutor.java:70)
at com.aerospike.client.AerospikeClient.scanAll(AerospikeClient.java:1519)
at com.aerospike.connect.reloader.services.AerospikeService.$anonfun$truncate$3(AerospikeService.scala:50)
at com.aerospike.connect.reloader.services.AerospikeService.$anonfun$truncate$3$adapted(AerospikeService.scala:48)
at scala.collection.Iterator$$anon$9.next(Iterator.scala:575)
at scala.collection.immutable.List.prependedAll(List.scala:153)
at scala.collection.immutable.List$.from(List.scala:651)
at scala.collection.immutable.List$.from(List.scala:648)
at scala.collection.IterableFactory$Delegate.from(Factory.scala:288)
at scala.collection.immutable.Iterable$.from(Iterable.scala:35)
at scala.collection.immutable.Iterable$.from(Iterable.scala:32)
at scala.collection.IterableOps$WithFilter.map(Iterable.scala:884)
at com.aerospike.connect.reloader.services.AerospikeService.$anonfun$truncate$1(AerospikeService.scala:48)
at scala.collection.StrictOptimizedIterableOps.flatMap(StrictOptimizedIterableOps.scala:117)
at scala.collection.StrictOptimizedIterableOps.flatMap$(StrictOptimizedIterableOps.scala:104)
at scala.collection.immutable.Map$Map1.flatMap(Map.scala:241)
at com.aerospike.connect.reloader.services.AerospikeService.truncate(AerospikeService.scala:47)
at com.aerospike.connect.reloader.tests.services.AerospikeServiceSpec.$anonfun$new$2(AerospikeServiceSpec.scala:23)
at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)
at org.scalatest.Transformer.apply(Transformer.scala:20)
at org.scalatest.wordspec.AnyWordSpecLike$$anon$3.apply(AnyWordSpecLike.scala:1077)
at org.scalatest.TestSuite.withFixture(TestSuite.scala:196)
at org.scalatest.TestSuite.withFixture$(TestSuite.scala:195)
at com.aerospike.connect.reloader.tests.services.AerospikeServiceSpec.withFixture(AerospikeServiceSpec.scala:13)
at org.scalatest.wordspec.AnyWordSpecLike.invokeWithFixture$1(AnyWordSpecLike.scala:1075)
at org.scalatest.wordspec.AnyWordSpecLike.$anonfun$runTest$1(AnyWordSpecLike.scala:1087)
at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
at org.scalatest.wordspec.AnyWordSpecLike.runTest(AnyWordSpecLike.scala:1087)
at org.scalatest.wordspec.AnyWordSpecLike.runTest$(AnyWordSpecLike.scala:1069)
at com.aerospike.connect.reloader.tests.services.AerospikeServiceSpec.runTest(AerospikeServiceSpec.scala:13)
at org.scalatest.wordspec.AnyWordSpecLike.$anonfun$runTests$1(AnyWordSpecLike.scala:1146)
at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
at scala.collection.immutable.List.foreach(List.scala:333)
at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:390)
at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:427)
at scala.collection.immutable.List.foreach(List.scala:333)
at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396)
at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475)
at org.scalatest.wordspec.AnyWordSpecLike.runTests(AnyWordSpecLike.scala:1146)
at org.scalatest.wordspec.AnyWordSpecLike.runTests$(AnyWordSpecLike.scala:1145)
at com.aerospike.connect.reloader.tests.services.AerospikeServiceSpec.runTests(AerospikeServiceSpec.scala:13)
at org.scalatest.Suite.run(Suite.scala:1112)
at org.scalatest.Suite.run$(Suite.scala:1094)
at com.aerospike.connect.reloader.tests.services.AerospikeServiceSpec.org$scalatest$BeforeAndAfterAll$$super$run(AerospikeServiceSpec.scala:13)
at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
at com.aerospike.connect.reloader.tests.services.AerospikeServiceSpec.org$scalatest$wordspec$AnyWordSpecLike$$super$run(AerospikeServiceSpec.scala:13)
at org.scalatest.wordspec.AnyWordSpecLike.$anonfun$run$1(AnyWordSpecLike.scala:1191)
at org.scalatest.SuperEngine.runImpl(Engine.scala:535)
at org.scalatest.wordspec.AnyWordSpecLike.run(AnyWordSpecLike.scala:1191)
at org.scalatest.wordspec.AnyWordSpecLike.run$(AnyWordSpecLike.scala:1189)
at com.aerospike.connect.reloader.tests.services.AerospikeServiceSpec.run(AerospikeServiceSpec.scala:13)
at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:45)
at org.scalatest.tools.Runner$.$anonfun$doRunRunRunDaDoRunRun$13(Runner.scala:1320)
at org.scalatest.tools.Runner$.$anonfun$doRunRunRunDaDoRunRun$13$adapted(Runner.scala:1314)
at scala.collection.immutable.List.foreach(List.scala:333)
at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:1314)
at org.scalatest.tools.Runner$.$anonfun$runOptionallyWithPassFailReporter$24(Runner.scala:993)
at org.scalatest.tools.Runner$.$anonfun$runOptionallyWithPassFailReporter$24$adapted(Runner.scala:971)
at org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:1480)
at org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:971)
at org.scalatest.tools.Runner$.run(Runner.scala:798)
at org.scalatest.tools.Runner.run(Runner.scala)
at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.runScalaTest2or3(ScalaTestRunner.java:38)
at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.main(ScalaTestRunner.java:25)
Any ideas why its happening?
LUT method:
def calculateCurrentLUT(): Long = {
logger.info("calculateCurrentLUTs() Triggered")
val policy = new WritePolicy()
policy.setTimeout(config.operationTimeoutInMillis)
val key = new Key(config.toRecover.head.namespace, AerospikeConfiguration.dummySetName, AerospikeConfiguration.dummyKey)
client.put(policy, key, new Bin(AerospikeConfiguration.dummyBin, "Used by the Recovery process to calculate current machine startTime"))
client.execute(policy, key, AerospikeConfiguration.packageName, "getLUT").asInstanceOf[Long]
}
with:
def registerUDFs(): RegisterTask = {
logger.info(s"registerUDFs() Triggered")
val policy = new WritePolicy()
policy.setTimeout(config.operationTimeoutInMillis)
client.registerUdfString(policy, """
|function getLUT(r)
| return record.last_update_time(r)
|end
|""", AerospikeConfiguration.packageName + ".lua", Language.LUA)
}
AerospikeException: Error -11,6,0,30000,0,5: Max retries exceeded: 5 means -11: error code, maximum retry attempts on this operation exceeded specified value. Shows 6 iterations (orig+maxretries) and you specified max retries at 5. Your connection settings are: 0 - for connectTimeout - wait to create initial socket, 0 is default, 30000 or 30s is your time to close an idle socket, 0 is the total timeout for this scan is operation - 0 means don't timeout which is correct for scans, 5 is the times you retried - looks like server is not responding back to client scan call in 30seconds and client closes the idle socket and retries and after 5 re-tries throws an Exception. Something is obviously wrong - check server log for more clues. For e.g. Are you using the correct server version that supports Expressions for scans? Second, I would check your computation of LUT comparison expression. if the filter expression is evaluating to false, scan will just return an EOF on completion, no matching records -but if socket times out before that, scan will go into a retry.
I am trying to implement manual offset commit for the messages received on kafka. I have set the offset commit to false, but the offset value keeps on increasing.
Not sure what is the reason. Need help resolving the issue.
Below is the code
application.yml
spring:
application:
name: kafka-consumer-sample
resources:
cache:
period: 60m
kafka:
bootstrapServers: localhost:9092
options:
enable:
auto:
commit: false
KafkaConfig.java
#Bean
public ConsumerFactory<String, String> consumerFactory() {
Map<String, Object> config = new HashMap<>();
config.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
config.put(ConsumerConfig.GROUP_ID_CONFIG, groupId);
config.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
config.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
config.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
return new DefaultKafkaConsumerFactory<>(config);
}
#Bean
public ConcurrentKafkaListenerContainerFactory<String, String> kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, String> factory = new ConcurrentKafkaListenerContainerFactory();
factory.setConsumerFactory(consumerFactory());
return factory;
}
KafkaConsumer.java
#Service
public class KafkaConsumer {
#KafkaListener(topics = "#{'${kafka-consumer.topics}'.split(',')}", groupId = "${kafka-consumer.groupId}")
public void consume(ConsumerRecord<String, String> record) {
System.out.println("Consumed Kafka Record: " + record);
record.timestampType();
System.out.println("record.timestamp() = " + record.timestamp());
System.out.println("***********************************");
System.out.println(record.timestamp());
System.out.println("record.key() = " + record.key());
System.out.println("Consumed String Message : " + record.value());
}
}
output is as follows
Consumed Kafka Record: ConsumerRecord(topic = test, partition = 0, offset = 31, CreateTime = 1573570989565, serialized key size = -1, serialized value size = 2, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 10)
record.timestamp() = 1573570989565
***********************************
1573570989565
record.key() = null
Consumed String Message : 10
Consumed Kafka Record: ConsumerRecord(topic = test, partition = 0, offset = 32, CreateTime = 1573570991535, serialized key size = -1, serialized value size = 2, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = 11)
record.timestamp() = 1573570991535
***********************************
1573570991535
record.key() = null
Consumed String Message : 11
Properties are as follows.
auto.commit.interval.ms = 100000000
auto.offset.reset = earliest
bootstrap.servers = [localhost:9092]
check.crcs = true
connections.max.idle.ms = 540000
enable.auto.commit = false
exclude.internal.topics = true
fetch.max.bytes = 52428800
fetch.max.wait.ms = 500
fetch.min.bytes = 1
group.id = mygroup
heartbeat.interval.ms = 3000
This is after I restart the consumer. I expected the earlier data to be printed as well.
Is my Understanding correct?
Please Note I am restarting my springboot app expecting the messages to start from first. and my kafka server and zookeeper are not terminated.
If theauto acknowledgement is disabled by using this property ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, Then you have to set the acknowledgement mode on container level to MANUAL and don't commit the offset because by default it is set to BATCH.
#Bean
public ConcurrentKafkaListenerContainerFactory<String, String> kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, String> factory = new ConcurrentKafkaListenerContainerFactory();
factory.setConsumerFactory(consumerFactory());
factory.getContainerProperties().setAckMode(AbstractMessageListenerContainer.AckMode.MANUAL);
return factory;
}
Because when auto acknowledgement is disabled container level acknowledgement is set to BATCH
public void setAckMode(ContainerProperties.AckMode ackMode)
Set the ack mode to use when auto ack (in the configuration properties) is false.
RECORD: Ack after each record has been passed to the listener.
BATCH: Ack after each batch of records received from the consumer has been passed to the listener
TIME: Ack after this number of milliseconds; (should be greater than #setPollTimeout(long) pollTimeout.
COUNT: Ack after at least this number of records have been received
MANUAL: Listener is responsible for acking - use a AcknowledgingMessageListener.
Parameters:
ackMode - the ContainerProperties.AckMode; default BATCH.
Committing Offsets
Several options are provided for committing offsets. If the enable.auto.commit consumer property is true, Kafka auto-commits the offsets according to its configuration. If it is false, the containers support several AckMode settings (described in the next list). The default AckMode is BATCH. Starting with version 2.3, the framework sets enable.auto.commit to false unless explicitly set in the configuration. Previously, the Kafka default (true) was used if the property was not set.
And if you want to read from the beginning always you have to set this property auto.offset.reset to earliest
config.put(ConsumerConfig. AUTO_OFFSET_RESET_CONFIG, "earliest");
Note : Make sure groupId must be the new one which does not have any offset in kafka
I'm using Kafka producer 10.2.1 to create a topic and to write to topic, when I create the topic I get the following error, but the topic is created:
java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms.
at org.apache.kafka.clients.producer.KafkaProducer$FutureFailure.<init>(KafkaProducer.java:774)
at org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:494)
at org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:440)
at org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:360)
at kafka.AvroProducer.produce(AvroProducer.java:47)
at samples.TestMqttSource.messageReceived(TestMqttSource.java:89)
at mqtt.JsonConsumer.messageArrived(JsonConsumer.java:132)
at org.eclipse.paho.client.mqttv3.internal.CommsCallback.deliverMessage(CommsCallback.java:477)
at org.eclipse.paho.client.mqttv3.internal.CommsCallback.handleMessage(CommsCallback.java:380)
at org.eclipse.paho.client.mqttv3.internal.CommsCallback.run(CommsCallback.java:184)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms.
msg org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms.
loc org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms.
cause org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms.
excep java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms.
All suggestions is highly appreciated.
You can't use KafkaProducer to create a topic (So I'm not quite sure how you managed to create the topic, unless you did it previously via a different method such as the kafka admin shell scripts). Instead you use the AdminUtils supplied by Kafka library.
I recently achieved both of the requirements you are after, and you'd be surprised how easy it is to achieve. Below is a simple code example showing you how to create a topic via AdminUtils, and how to then write to it.
class Foo {
private String TOPIC = "testingTopic";
private int NUM_OF_PARTITIONS = 10;
private int REPLICATION_FACTOR = 1;
public Foo() {
ZkClient zkClient = new ZkClient( "localhost:2181", 15000, 10000, ZKStringSerializer$.MODULE$ );
ZkUtils zkUtils = new ZkUtils( zkClient, new ZkConnection( "localhost:2181" ), false);
if ( !AdminUtils.topicExists(zkUtils, TOPIC) ) {
try {
AdminUtils.createTopic(zkUtils, TOPIC, NUM_OF_PARTITIONS, REPLICATION_FACTOR, new Properties(), Enforced$.MODULE$);
Properties producerConfig = new Properties();
producerConfig.put(ProducerConfig.BOOTSTRAP_SERVER_CONFIG, "localhost:9092");
producerConfig.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.ByteArraySerializer");
producerConfig.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
KafkaProducer<String, String> producer = new KafkaProducer<>(producerConfig);
// This is just to show you how to write but you could be more elaborate
int i = 0;
while ( i < 11 ) {
ProducerRecord<String, String> rec = new ProducerRecord<>(TOPIC, ("This is line number " + i));
producer.send(rec);
i++;
}
producer.closer();
} catch ( AdminOperationException aoe ) {
aoe.printStackTrace();
}
}
}
}
Remember that if you want to delete topics, by default in the settings this is disabled. The config file you use when starting Kafka (by default it is ${kafka_home}/config/server.properties), add the following line if it doesn't already exist and is set to false or commented out:
delete.topic.enabled=true
You'll then have to restart the server and can delete topics either via Java or the command line tools supplied.
NB
It's always a good idea to close producers / consumers when you are finished with them, as shown in the code example.
I have what seems to be a simple Flume configuration that is giving me a lot of problems. Let me first describe the problem and then I'll list the configuration files.
I have 3 servers: Server1, Server2, Server3.
Server1:
Netcat source / Syslogtcp source (I tested this on both netcat with no acks and syslogtcp)
2 memory channels
2 Avro sinks (one per channel)
Replicating selector with second memory channel optional
Server2,3:
Avro source
memory channel
Kafka sink
In my simulation, Server2 is simulating "production" and thus cannot experience any data loss whereas Server3 is simulating "development" and data loss is fine.
My assumption is that using 2 channels and 2 sources will decouple the two servers from each other and if Server3 goes down, it won't affect Sever2 (especially with the optional configuration option!). However, this is not the case. When I run my simulations and terminate Server3 with CTRL-C, I experience slowdown on Server2 and the output to the Kafka sink from Server2 becomes a crawl. When I resume the Flume agent on Server3, everything goes back to normal.
I didn't expect this behavior. What I expected was that because I have two channels and two sinks, if one channel and/or sink goes down, the other channel and/or sink shouldn't have a problem. Is this a limitation of Flume? Is this a limitation of my sources, sinks, or channels? Is there a way to have Flume behave where I use one agent with multiple channels and sinks that are decoupled from each other? I really don't want to have multiple Flume agents on one machine for each "environment" (production and development). Attached are my config files so you can see what I did in a more technical way:
SERVER1 (FIRST TIER AGENT)
#Describe the top level configuration
agent.sources = mySource
agent.channels = defaultChannel1 defaultChannel2
agent.sinks = mySink1 mySink2
#Describe/configure the source
agent.sources.mySource.type = netcat
agent.sources.mySource.port = 6666
agent.sources.mySource.bind = 0.0.0.0
agent.sources.mySource.max-line-length = 150000
agent.sources.mySource.ack-every-event = false
#agent.sources.mySource.type = syslogtcp
#agent.sources.mySource.host = 0.0.0.0
#agent.sources.mySource.port = 7103
#agent.sources.mySource.eventSize = 150000
agent.sources.mySource.channels = defaultChannel1 defaultChannel2
agent.sources.mySource.selector.type = replicating
agent.sources.mySource.selector.optional = defaultChannel2
#Describe/configure the channel
agent.channels.defaultChannel1.type = memory
agent.channels.defaultChannel1.capacity = 5000
agent.channels.defaultChannel1.transactionCapacity = 200
agent.channels.defaultChannel2.type = memory
agent.channels.defaultChannel2.capacity = 5000
agent.channels.defaultChannel2.transactionCapacity = 200
#Avro Sink
agent.sinks.mySink1.channel = defaultChannel1
agent.sinks.mySink1.type = avro
agent.sinks.mySink1.hostname = Server2
agent.sinks.mySink1.port = 6666
agent.sinks.mySink2.channel = defaultChannel2
agent.sinks.mySink2.type = avro
agent.sinks.mySink2.hostname = Server3
agent.sinks.mySink2.port = 6666
SERVER2 "PROD" FLUME AGENT
#Describe the top level configuration
agent.sources = mySource
agent.channels = defaultChannel
agent.sinks = mySink
#Describe/configure the source
agent.sources.mySource.type = avro
agent.sources.mySource.port = 6666
agent.sources.mySource.bind = 0.0.0.0
agent.sources.mySource.max-line-length = 150000
agent.sources.mySource.channels = defaultChannel
#Describe/configure the interceptor
agent.sources.mySource.interceptors = myInterceptor
agent.sources.mySource.interceptors.myInterceptor.type = myInterceptor$Builder
#Describe/configure the channel
agent.channels.defaultChannel.type = memory
agent.channels.defaultChannel.capacity = 5000
agent.channels.defaultChannel.transactionCapacity = 200
#Describe/configure the sink
agent.sinks.mySink.type = org.apache.flume.sink.kafka.KafkaSink
agent.sinks.mySink.topic = Server2-topic
agent.sinks.mySink.brokerList = broker1:9092, broker2:9092
agent.sinks.mySink.requiredAcks = -1
agent.sinks.mySink.batchSize = 100
agent.sinks.mySink.channel = defaultChannel
SERVER3 "DEV" FLUME AGENT
#Describe the top level configuration
agent.sources = mySource
agent.channels = defaultChannel
agent.sinks = mySink
#Describe/configure the source
agent.sources.mySource.type = avro
agent.sources.mySource.port = 6666
agent.sources.mySource.bind = 0.0.0.0
agent.sources.mySource.max-line-length = 150000
agent.sources.mySource.channels = defaultChannel
#Describe/configure the interceptor
agent.sources.mySource.interceptors = myInterceptor
agent.sources.mySource.interceptors.myInterceptor.type = myInterceptor$Builder
#Describe/configure the channel
agent.channels.defaultChannel.type = memory
agent.channels.defaultChannel.capacity = 5000
agent.channels.defaultChannel.transactionCapacity = 200
#Describe/configure the sink
agent.sinks.mySink.type = org.apache.flume.sink.kafka.KafkaSink
agent.sinks.mySink.topic = Server3-topic
agent.sinks.mySink.brokerList = broker1:9092, broker2:9092
agent.sinks.mySink.requiredAcks = -1
agent.sinks.mySink.batchSize = 100
agent.sinks.mySink.channel = defaultChannel
Thanks for your help!
I would look at tweaking this configuration parameters as it has to do with the memory channel:
agent.channels.defaultChannel.capacity = 5000
agent.channels.defaultChannel.transactionCapacity = 200
Possibly try doubling first, and perform the test again and you should see improvments:
agent.channels.defaultChannel.capacity = 10000
agent.channels.defaultChannel.transactionCapacity = 400
It would be also be good to observe the JVMs of the Apache Flume instances when during your tests