Apache Storm - KafkaSpout not consuming messaes from Kafka Topic - java

I am trying to integrate Kafka to Storm Toplogy using below code but unfortunately KafkaSpout is not consuming messages from Kafka-topic. At Storm UI-Core, Emitted count remains 0 forever.
String bootStrapServer = "10.20.10.238:9092";
String topic = "test.topic";
KafkaSpoutConfig.Builder spoutConfigBuilder = KafkaSpoutConfig.builder(bootStrapServer,topic);
spoutConfigBuilder.setProp(ConsumerConfig.RECEIVE_BUFFER_CONFIG,100*1024*1024);
spoutConfigBuilder.setProp(ConsumerConfig.MAX_PARTITION_FETCH_BYTES_CONFIG,100*1024*1024);
spoutConfigBuilder.setProcessingGuarantee(KafkaSpoutConfig.ProcessingGuarantee.AT_LEAST_ONCE);
Boolean readFromStart = true;
if(readFromStart) {
spoutConfigBuilder.setFirstPollOffsetStrategy(FirstPollOffsetStrategy.EARLIEST);
}
else {
spoutConfigBuilder.setFirstPollOffsetStrategy(FirstPollOffsetStrategy.LATEST);
}
KafkaSpout spout = new KafkaSpout(spoutConfigBuilder.build());
builder.setSpout("kafkaSpout", spout, 1);
// And a Bolt to see messages
builder.setBolt("fcBolt", new FcBolt(), 1).setNumTasks(1).shuffleGrouping("kafkaSpout");
But when I tried to see the produced messages from CLI , I am able to see all messages on topic with below command:
bin/kafka-console-consumer.sh --topic test.topic --from-beginning --bootstrap-server 10.20.10.238:9092
Picked up _JAVA_OPTIONS: -Xmx128000m
test
test
test1
....
Versions:
Storm : 2.2.0
Kafka : 2.13_2.6.0
At old versions,it works fine! Something I had missed to read at newer version.
Any help appreciated. Thanks in Advance!

Hard to know with what you have, so consider showing the rest of your code too.
But from what you do have, it does not appear like you are actually producing any events.
If you are trying to consume kafka events in your spout for further processing then make sure you are actually subscribed to a topic that is having events created on it, and then you can not see the event output through the console consumer since you are consuming them in Storm, not producing them.
If you are trying to produce kafka events to the test topic through Storm and then trying to consume them through the console consumer then make sure you are actually producing events in storm.
Hope that puts you on the right path, I would suggest going over the base-concepts of Kafka here: Kafka Introduction

Related

how to consume a kafka topic from a specific offset?

recently I am using kafka,
I have a topic and I am using the following code to consume
#KafkaListener(topics = "topic_name", groupId = "_id" , id = "pro", containerFactory = "kafkaListenerContainerFactory")
public void consume(ConsumerRecord<String, String> record, Acknowledgment ack) {
kafkaService.proccessorConsumer(record);
ack.acknowledge();
}
every thing works fine, but I need to handle a situation where if the service stopped for any reason, then started I want to continue consuming from the last message that has processed, I do understand that the acknowledgment help with this, but for the sake of certainty I saved the last consumed offset somewhere.
my question is how I could use that offset to start consuming the topic from it.
As #OneCricketeer indicates, what you are trying to achieve is the default behaviour of the Kafka consumer, if you haven't disabled automatic commit.
You can check this by describing your consumer group using the consumer id as follows, just check that the offset of your consumer is the same as the one you have stored elsewhere.
> bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group my-group-id

How to publish a message to a Kafka Topic in Lagom

I have started using lagom recently. Trying out a microservice where I receive a kafka message and after some processing publish another message to a different kafka topic. Based on this link my understanding is, a message should be published on the constructed topic - especially this part of the sample code I am referring to.
final PubSubRef<Temperature> topic = pubSub.refFor(TopicId.of(Temperature.class, id));
topic.publish(temperature);
I couldn’t build Temperature DTO to POST from rest client. So I created my on DTO which is exactly similar to HelloEvent - in my case its KafkaEvent.
I tried to use the code from here
However I did not see the topic created after performing POST operation. I did add print statements and they do appear in console.
System.out.println("Received id:" + id);
final PubSubRef<KafkaEvent> topic = pubSub.refFor(TopicId.of(KafkaEvent.class, id));
topic.publish(temperature);
System.out.println("Sent to:" + topic.toString());
I am not seeing any error in kafka server log or in my project.
Is there any step I am missing? or my understanding is wrong in usage of PubSubRegistry?
Please do let me know if further details are required.
Thanks in advance
Naveena
If you want to use Kafka, you are using incorrect approach. This post that you described does not use Kafka. It just broadcast messages to all subscribers. If you want to use Kafka you need to use message broker support, it will create what you want. Please read section limitations, it will give you mire information.

Apache Storm Trident and Kafka Spout Integration

I am unable to find good documentation for correctly integrating Kafka with Apache Storm Trident. I tried to look into the related previously posted questions here, but no sufficient information.
I would like to connect Trident with Kafka as OpaqueTridentKafkaSpout. Here is the sample Code which is currently working
GlobalPartitionInformation globalPartitionInformation = new GlobalPartitionInformation(properties.getProperty("topic", "mytopic"));
Broker brokerForPartition0 = new Broker("IP1",9092);
Broker brokerForPartition1 = new Broker("IP2", 9092);
Broker brokerForPartition2 = new Broker("IP3:9092");
globalPartitionInformation.addPartition(0, brokerForPartition0);//mapping from partition 0 to brokerForPartition0
globalPartitionInformation.addPartition(1, brokerForPartition1);//mapping from partition 1 to brokerForPartition1
globalPartitionInformation.addPartition(2, brokerForPartition2);//mapping from partition 2 to brokerForPartition2
StaticHosts staticHosts = new StaticHosts(globalPartitionInformation);
TridentKafkaConfig tridentKafkaConfig = new TridentKafkaConfig(hosts,properties.getProperty("topic", "mytopic"));
tridentKafkaConfig.scheme = new SchemeAsMultiScheme(new StringScheme());
OpaqueTridentKafkaSpout kafkaSpout = new OpaqueTridentKafkaSpout(tridentKafkaConfig);
With this I am able to generate streams for my topology as shown in the code below
TridentTopology topology = new TridentTopology();
Stream analyticsStream = topology.newStream("spout", kafkaSpout).parallelismHint(Integer.valueOf(properties.getProperty("spout","6")))
Though I have provided parallelism and my partitions, only 1 executor of Kafka Spout is running and thereby I am unable to scale it well.
Can anyone please guide me better ways of integrating Apache Storm Trident (2.0.0) with Apache Kafka (1.0) with 3 node cluster each ?
Also, as soon as it finishes reading from Kafka, I am getting these logs constantly
2018-04-09 14:17:34.119 o.a.s.k.KafkaUtils Thread-15-spout-spout-executor[79 79] [INFO] Metrics Tick: Not enough data to calculate spout lag. 2018-04-09 14:17:34.129 o.a.s.k.KafkaUtils Thread-21-spout-spout-executor[88 88] [INFO] Metrics Tick: Not enough data to calculate spout lag.
And in Storm UI, I can see acks for the messages above. Any suggestion to ignore metric Ticks ?
If you are on Storm 2.0.0 anyway, I think you should switch to the storm-kafka-client Trident spout. The storm-kafka module is only intended to support older Kafka versions, since the underlying Kafka API (SimpleConsumer) is being removed. The new module supports Kafka from 0.10.0.0 and forward.
You can find an example Trident topology for the new spout here https://github.com/apache/storm/blob/master/examples/storm-kafka-client-examples/src/main/java/org/apache/storm/kafka/trident/TridentKafkaClientTopologyNamedTopics.java.

Kafka Consumer Properties to read from the maximum offset

I have written a Java Kafka Consumer. I would like to make sure how to explicitly ensure that once the Kafka Consumer is started it only reads the messages which are sent by the producer from that time onwards i.e. it should not read any messages which have already been sent by the producer to Kafka. Can anyone explain how to ensure this? :
Here is a snippet of the properties I use
Properties properties = new Properties();
properties.put("zookeeper.connect", zookeeperHost);
properties.put("group.id", group);
properties.put("auto.offset.reset","largest");
ConsumerConfig consumerConfig = new ConsumerConfig(properties);
consumerConnector = Consumer.createJavaConsumerConnector(consumerConfig);
UPDATE Sept14:
I am using the following properties, it seems that the consumer still reads from the beginning at times, can someone tell me what's wrong now?
I am using Kafka Version 0.8.2
properties.put("auto.offset.reset","largest");
properties.put("auto.commit.enable","false");
Based on answers above, it seems that the correct mechanism is as follows for setting properties of the consumer:
properties.put("auto.offset.reset","largest");
properties.put("auto.commit.enable","false");
This ensures reading from the maximum offset

KafkaSpout is not receiving anything from Kafka

I am trying to rig up a a Kafka-Storm "Hello World" system. I have Kafka installed and running, when I send data with the Kafka producer I can read it with the Kafka console consumer.
I took the Chapter 02 example from the "Getting Started With Storm" O'Reilly book, and modified it to use KafkaSpout instead of a regular spout.
When I run the application, with data already pending in kafka, nextTuple of the KafkaSpout doesn't get any messages - it goes in, tries to iterate over an empty managers list under the coordinator, and exits.
My environment is a fairly old Cloudera VM, with Storm 0.9 and Kafka-Storm-0.9(the latest), and Kafka 2.9.2-0.7.0.
This is how I defined the SpoutConfig and the topology:
String zookeepers = "localhost:2181";
SpoutConfig spoutConfig = new SpoutConfig(new SpoutConfig.ZkHosts(zookeepers, "/brokers"),
"gtest",
"/kafka", // zookeeper root path for offset storing
"KafkaSpout");
spoutConfig.forceStartOffsetTime(-1);
KafkaSpoutTester kafkaSpout = new KafkaSpoutTester(spoutConfig);
//Topology definition
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("word-reader", kafkaSpout, 1);
builder.setBolt("word-normalizer", new WordNormalizer())
.shuffleGrouping("word-reader");
builder.setBolt("word-counter", new WordCounter(),1)
.fieldsGrouping("word-normalizer", new Fields("word"));
//Configuration
Config conf = new Config();
conf.put("wordsFile", args[0]);
conf.setDebug(false);
//Topology run
conf.put(Config.TOPOLOGY_MAX_SPOUT_PENDING, 1);
cluster = new LocalCluster();
cluster.submitTopology("Getting-Started-Toplogie", conf, builder.createTopology());
Can someone please help me figure out why I am not receiving anything?
Thanks,
G.
If you've already consumed the message, it is not supposed read any more, unless your producer produces new messages. It is because of the forceStartOffsetTime call with -1 in your code.
Form storm-contrib documentation:
Another very useful config in the spout is the ability to force the spout to rewind to a previous offset. You do forceStartOffsetTime on the spout config, like so:
spoutConfig.forceStartOffsetTime(-2);
It will choose the latest offset written around that timestamp to start consuming. You can force the spout to always start from the latest offset by passing in -1, and you can force it to start from the earliest offset by passing in -2.
How you producer looks like? would be useful to have a snippet. You can replace -1 by -2 and see if you receive anything, if your producer is fine then you should be able to consume.
SpoutConfig spoutConf = new SpoutConfig(...)
spoutConf.startOffsetTime = kafka.api.OffsetRequest.LatestTime();
SpoutConfig spoutConfig = new SpoutConfig(new SpoutConfig.ZkHosts(zookeepers, "/brokers"),
"gtest", // name of topic used by producer & consumer
"/kafka", // zookeeper root path for offset storing
"KafkaSpout");
You are using "gtest" topic for receiving the data. Make sure that you are sending data from this topic by producer.
And in the bolt, print that tuple like that
public void execute(Tuple tuple, BasicOutputCollector collector) {
System.out.println(tuple);
}
It should print the pending data in kafka.
I went through some grief getting storm and Kafka integrated. These are both fast moving and relatively young projects, so it can be hard getting working examples to jump start your development.
To help other developers (and hopefully get others contributing useful examples that I can use as well), I started a github project to house code snippets related to Storm/Kafka (and Esper) development.
You are welcome to check it out here >
https://github.com/buildlackey/cep
(click on the storm+kafka directory for a sample program that should get you up and running).

Categories