I am using kafka 0.8 version and very much new to it.
I want to know the list of topics created in kafka server along with it's
metadata.
Is there any API available to find out this?
Basically, I need to write a Java consumer that should auto-discover any topic in kafka server.There is API to fetch TopicMetadata, but this needs name of topic as input
parameters.I need information for all topics present in server.
with Kafka 0.9.0
you can list the topics in the server with the provided consumer method listTopics();
eg.
Map<String, List<PartitionInfo> > topics;
Properties props = new Properties();
props.put("bootstrap.servers", "1.2.3.4:9092");
props.put("group.id", "test-consumer-group");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(props);
topics = consumer.listTopics();
consumer.close();
I think this is the best way:
ZkClient zkClient = new ZkClient("zkHost:zkPort");
List<String> topics = JavaConversions.asJavaList(ZkUtils.getAllTopics(zkClient));
A good place to start would be the sample shell scripts shipped with Kafka.
In the /bin directory of the distribution there's some shell scripts you can use, one of which is ./kafka-topic-list.sh
If you run that without specifying a topic, it will return all topics with their metadata.
See:
https://github.com/apache/kafka/blob/0.8/bin/kafka-list-topic.sh
That shell script in turn runs:
https://github.com/apache/kafka/blob/0.8/core/src/main/scala/kafka/admin/ListTopicCommand.scala
The above are both references to the 0.8 Kafka version, so if you're using a different version (even a point difference), be sure to use the appropriate branch/tag on github
Using Scala:
import java.util.{Properties}
import org.apache.kafka.clients.consumer.KafkaConsumer
object KafkaTest {
def main(args: Array[String]): Unit = {
val brokers = args(0)
val props = new Properties();
props.put("bootstrap.servers", brokers);
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
val consumer = new KafkaConsumer[String, String](props);
val topics = consumer.listTopics().keySet();
println(topics)
}
}
If you want to pull broker or other-kafka information from Zookeeper then kafka.utils.ZkUtils provides a nice interface. Here is the code I have to list all zookeeper brokers (there are a ton of other methods there):
List<Broker> listBrokers() {
final ZkConnection zkConnection = new ZkConnection(connectionString);
final int sessionTimeoutMs = 10 * 1000;
final int connectionTimeoutMs = 20 * 1000;
final ZkClient zkClient = new ZkClient(connectionString,
sessionTimeoutMs,
connectionTimeoutMs,
ZKStringSerializer$.MODULE$);
final ZkUtils zkUtils = new ZkUtils(zkClient, zkConnection, false);
scala.collection.JavaConversions.seqAsJavaList(zkUtils.getAllBrokersInCluster());
}
You can use zookeeper API to get the list of brokers as mentioned below:
ZooKeeper zk = new ZooKeeper("zookeeperhost, 10000, null);
List<String> ids = zk.getChildren("/brokers/ids", false);
List<Map> brokerList = new ArrayList<>();
ObjectMapper objectMapper = new ObjectMapper();
for (String id : ids) {
Map map = objectMapper.readValue(zk.getData("/brokers/ids/" + id, false, null), Map.class);
brokerList.add(map);
}
Use this broker list to get all the topic using the following link
https://cwiki.apache.org/confluence/display/KAFKA/Finding+Topic+and+Partition+Leader
Related
I am using kafka_2.12 version 2.3.0 where I am publishing data into kafka topic using partition and key. I need to find a way using which I can consume a particular message from topic using key and partition combination. That way I won't have to consume all the messages and iterate for the correct one.
Right now I am only able to do this
KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(props)
consumer.subscribe(Collections.singletonList("topic"))
ConsumerRecords<String, String> records = consumer.poll(100)
def data = records.findAll {
it -> it.key().equals(key)
}
You can't "get messages by key from Kafka".
One solution, if practical, would be to have as many partitions as keys and always route messages for a key to the same partition.
Message Key as Partition
kafkaConsumer.assign(topicPartitions);
kafkaConsumer.seekToBeginning(topicPartitions);
// Pull records from kafka, keep polling until we get nothing back
final List<ConsumerRecord<byte[], byte[]>> allRecords = new ArrayList<>();
ConsumerRecords<byte[], byte[]> records;
do {
// Grab records from kafka
records = kafkaConsumer.poll(2000L);
logger.info("Found {} records in kafka", records.count());
// Add to our array list
records.forEach(allRecords::add);
}
while (!records.isEmpty());
Access messages of a Topic using Topic Name only
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList(<Topic Name>,<Topic Name>));
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records)
System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());
}
There are two ways to consume topic/partitions is:
KafkaConsumer.assign() : Document link
KafkaConsumer.subscribe() : Document link
So, You can't get messages by key.
If you don't have a plan to expand partitions, consider using assign() method. Because all the messages that come with the specific key will go to the same partition.
How to use:
KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(properties);
TopicPartition partition = new TopicPartition("some-topic", 0);
consumer.assign(Arrays.asList(partition));
while(true){
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
String data = records.findAll {
it -> it.key().equals(key)
}
}
I am continuously sending data in Avro format to a topic called "SD_RTL". This data generation is possible through a Confluent datagen that conforms to a custom Avro schema.
When I use kafka-avro-console-consumer on this topic, I see my data correctly. Here's an example of one correctly received tuple:
{"_id":1276215,"serialno":"0","timestamp":416481,"locationid":"Location_0","gpscoords":{"latitude":-2.9789479087622794,"longitude":-4.344459940322691},"data":{"tag1":0,"tag2":1}}
The problem appears when I try to consume this data through a Java app. I get the error "unknown magic byte".
I am using the code inspired by the second snippet under the Serializer section in confluent's website.
This is my code:
//consumer properties
Properties props = new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(ConsumerConfig.GROUP_ID_CONFIG, "group1");
//string inputs and outputs
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringDeserializer");
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, "io.confluent.kafka.serializers.KafkaAvroDeserializer");
props.put("schema.registry.url", "localhost:8081");
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
//subscribe to topic
String topic = "SD_RTL";
final Consumer<String, SensorsPayload> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList(topic));
try {
while (true) {
ConsumerRecords<String, SensorsPayload> records = consumer.poll(100);
for (ConsumerRecord<String, SensorsPayload> record : records) {
System.out.printf("offset = %d, key = %s, value = %s \n", record.offset(), record.key(), record.value());
}
}
} finally {
consumer.close();
}
My code and confluent's code are a bit different. For example, Confluent uses the line:
final Consumer<String, GenericRecord> consumer = new KafkaConsumer<String, String>(props);
Whereas if I put the right part as <String, String> instead of <String, SensorsPayload>, IntelliJ complains about incompatible types. I'm not sure if it's related to my issue.
I've generated my SensorsPayload class automatically from an Avro schema through the avro-maven-plugin.
Why does my Consumer app generate an "unknown magic byte" error when kafka's avro console consumer does not?
While trying to configure a newly created kafka topic, using java kafka adminClient, values are overwritten.
I have tried to set the same topic configuration using console commands and it works. Unfortunately when I try through Java code some values collide and are overwritten.
ConfigResource resource = new ConfigResource(ConfigResource.Type.TOPIC, topicName);
Map<ConfigResource, Config> updateConfig = new HashMap<>();
// update retention Bytes for this topic
ConfigEntry retentionBytesEntry = new ConfigEntry(TopicConfig.RETENTION_BYTES_CONFIG, String.valueOf(retentionBytes));
updateConfig.put(resource, new Config(Collections.singleton(retentionBytesEntry)));
// update retention ms for this topic
ConfigEntry retentionMsEntry = new ConfigEntry(TopicConfig.RETENTION_MS_CONFIG, String.valueOf(retentionMs));
updateConfig.put(resource, new Config(Collections.singleton(retentionMsEntry)));
// update segment Bytes for this topic
ConfigEntry segmentBytesEntry = new ConfigEntry(TopicConfig.SEGMENT_BYTES_CONFIG, String.valueOf(segmentbytes));
updateConfig.put(resource, new Config(Collections.singleton(segmentBytesEntry)));
// update segment ms for this topic
ConfigEntry segmentMsEntry = new ConfigEntry(TopicConfig.SEGMENT_MS_CONFIG, String.valueOf(segmentMs));
updateConfig.put(resource, new Config(Collections.singleton(segmentMsEntry)));
// Update the configuration
client.alterConfigs(updateConfig);
I expect the topic to have all given configuration values correctly.
Your logic is not working correctly because you call Map.put() several times with the same key. Hence only the last entry is kept.
The correct way to specify multiple topic configurations is to add them in the ConfigEntry object. Only after add the ConfigEntry to the Map.
For example:
// Your Topic Resource
ConfigResource cr = new ConfigResource(Type.TOPIC, "mytopic");
// Create all your configurations
Collection<ConfigEntry> entries = new ArrayList<>();
entries.add(new ConfigEntry(TopicConfig.SEGMENT_BYTES_CONFIG, String.valueOf(segmentbytes)));
entries.add(new ConfigEntry(TopicConfig.RETENTION_BYTES_CONFIG, String.valueOf(retentionBytes)));
...
// Create the Map
Config config = new Config(entries);
Map<ConfigResource, Config> configs = new HashMap<>();
configs.put(cr, config);
// Call alterConfigs()
admin.alterConfigs(configs);
I have many objects of a class Say Test which I want to write to Kafka and process them using spark streaming App. I want to use the Kryo Serialization.
My application is in Java
JavaDStream<Test> testData = KafkaUtils
.createDirectStream(context , keyClass,valueClass ,keyDecoderClass ,valueDecoderClass , props,topics);
My question is what should I put for keyClass,valueClass ,keyDecoderClass ,valueDecoderClass ?
Say if your topic is "String " and value is "Test" then first you would need to create TestEncoder and TestDecoder classes by implementing kafka.serializer.Encoder and kafka.serializer.Decoder. Now in your createDirectStream method you can have
JavaPairInputDStream<String, Test> testData = KafkaUtils
.createDirectStream(context, String.class,Test.class ,StringDecoder.class,TestDecoder.class,props,topics);
You can refer KafkaKryoEncoder at https://www.tomsdev.com/blog/2015/storm-kafka-complex-types/
In your Kafka producer you would need to register your custom Encoder class like
Properties properties = new Properties();
properties.put("metadata.broker.list", brokerList);
properties.put("serializer.class", "com.my.TestEncoder");
Producer<String, Test> producer = new Producer<String, Test>(new ProducerConfig(properties));
Test test = new Test();
KeyedMessage<String, Test> data = new KeyedMessage<String, Test>("myTopic", test);
producer.send(data);
I'm using KafkaConsumer to consume messages from Kafka server (topics)..
It works fine for topics created before starting Consumer code...
But the problem is, it will not work if the topics created dynamically(i mean to say after consumer code started), but the API says it will support dynamic topic creation.. Here is the link for your reference..
Kafka version used : 0.9.0.1
https://kafka.apache.org/090/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html
Here is the JAVA code...
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "test");
props.put("enable.auto.commit", "false");
props.put("auto.commit.interval.ms", "1000");
props.put("session.timeout.ms", "30000");
props.put("key.deserializer","org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer","org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
Pattern r = Pattern.compile("siddu(\\d)*");
consumer.subscribe(r, new HandleRebalance());
try {
while(true) {
ConsumerRecords<String, String> records = consumer.poll(Long.MAX_VALUE);
for (TopicPartition partition : records.partitions()) {
List<ConsumerRecord<String, String>> partitionRecords = records.records(partition);
for (ConsumerRecord<String, String> record : partitionRecords) {
System.out.println(partition.partition() + ": " +record.offset() + ": " + record.value());
}
long lastOffset = partitionRecords.get(partitionRecords.size() - 1).offset();
consumer.commitSync(Collections.singletonMap(partition, new OffsetAndMetadata(lastOffset + 1)));
}
}
} finally {
consumer.close();
}
NOTE: My topic names are matching the Regular Expression..
And if i restart the consumer then it will start reading messages pushed to topic...
Any help is really appreciated...
There was an answer to this in the apache kafka mail archives. I am copying it below:
The consumer supports a configuration option "metadata.max.age.ms"
which basically controls how often topic metadata is fetched. By
default, this is set fairly high (5 minutes), which means it will take
up to 5 minutes to discover new topics matching your regular
expression. You can set this lower to discover topics quicker.
So in your props you can:
props.put("metadata.max.age.ms", 5000);
This will cause your consumer to find out about new topics every 5 seconds.
You can hook into Zookeeper. Check out the sample code. In essence, you will create a watcher on the Zookeeper node /brokers/topics. When new children are added here, it's a new Topic being added, and your watcher will get triggered.
Note that the difference between this and the other answer is that this one is a trigger where the other is a polling -- this one will be as close to real-time as possible, the other will be within whatever your polling interval is at best.
Here is the solution it worked for me by using KafkaConsumer api. Here is the Java code for it.
private static Consumer<Long, String> createConsumer(String topic) {
final Properties props = new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,
BOOTSTRAP_SERVERS);
props.put(ConsumerConfig.GROUP_ID_CONFIG,
"KafkaExampleConsumer");
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,
StringDeserializer.class.getName());
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,
StringDeserializer.class.getName());
// Create the consumer using props.
final Consumer<Long, String> consumer =
new KafkaConsumer<>(props);
// Subscribe to the topic.
consumer.subscribe(Collections.singletonList(topic));
return consumer;
}
public static void runConsumer(String topic) throws InterruptedException {
final Consumer<Long, String> consumer = createConsumer(topic);
ConsumerRecords<Long, String> records = consumer.poll(100);
for (ConsumerRecord<Long, String> record : records)
System.out.printf("hiiiii offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());
consumer.commitAsync();
consumer.close();
//System.out.println("DONE");
}
using this we can consume the message from dynamically created topics.
use the subscribe method in KafkaConsumer class which takes a pattern as argument for the list of topics to get data from
/**
Subscribe to all topics matching specified pattern to get dynamically assigned partitions. * The pattern matching will be done
periodically against all topics existing at the time of check. * This
can be controlled through the {#code metadata.max.age.ms}
configuration: by lowering * the max metadata age, the consumer will
refresh metadata more often and check for matching topics. * *
See {#link #subscribe(Collection, ConsumerRebalanceListener)} for
details on the * use of the {#link ConsumerRebalanceListener}.
Generally rebalances are triggered when there * is a change to the
topics matching the provided pattern and when consumer group
membership changes. * Group rebalances only take place during an
active call to {#link #poll(Duration)}. * * #param pattern Pattern
to subscribe to * #param listener Non-null listener instance to get
notifications on partition assignment/revocation for the *
subscribed topics * #throws IllegalArgumentException If pattern or
listener is null * #throws IllegalStateException If {#code
subscribe()} is called previously with topics, or assign is called *
previously (without a subsequent call to {#link #unsubscribe()}), or
if not * configured at-least one
partition assignment strategy */ #Override public void
subscribe(Pattern pattern, ConsumerRebalanceListener listener) {