I have a Kafka producer class which works fine. The producer fills the Kafka topic. Its code is in following:
public class kafka_test {
private final static String TOPIC = "flinkTopic";
private final static String BOOTSTRAP_SERVERS = "10.32.0.2:9092,10.32.0.3:9092,10.32.0.4:9092";
public FlinkKafkaConsumer<String> createStringConsumerForTopic(
String topic, String kafkaAddress, String kafkaGroup) {
// ************************** KAFKA Properties ******
Properties props = new Properties();
props.setProperty("bootstrap.servers", kafkaAddress);
props.setProperty("group.id", kafkaGroup);
FlinkKafkaConsumer<String> myconsumer = new FlinkKafkaConsumer<>(
topic, new SimpleStringSchema(), props);
myconsumer.setStartFromLatest();
return myconsumer;
}
private static Producer<Long, String> createProducer() {
Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, BOOTSTRAP_SERVERS);
props.put(ProducerConfig.CLIENT_ID_CONFIG, "MyKafkaProducer");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, LongSerializer.class.getName());
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
return new KafkaProducer<>(props);
}
public void runProducer(String msg) throws Exception {
final Producer<Long, String> producer = createProducer();
try {
final ProducerRecord<Long, String> record = new ProducerRecord<>(TOPIC, msg );
RecordMetadata metadata = producer.send(record).get();
System.out.printf("sent record(key=%s value='%s')" + " metadata(partition=%d, offset=%d)\n",
record.key(), record.value(), metadata.partition(), metadata.offset());
} finally {
producer.flush();
producer.close();
}
}
}
public class producerTest {
public static void main(String[] args) throws Exception{
kafka_test objKafka=new kafka_test();
String pathFile="/home/cfms11/IdeaProjects/pooyaflink2/KafkaTest/quickstart/lastDay4.csv";
String delimiter="\n";
objKafka.createStringProducer("flinkTopic",
"10.32.0.2:9092,10.32.0.3:9092,10.32.0.4:9092");
Scanner scanner = new Scanner(new File(pathFile));
scanner.useDelimiter(delimiter);
int i=0;
while(scanner.hasNext()){
if (i==0)
TimeUnit.MINUTES.sleep(1);
objKafka.runProducer(scanner.next());
i++;
}
scanner.close();
}
}
Because, I want to provide data for my Flink program, so, I use Kafka. In fact, I have this part code to consume data from Kafka topic:
Properties props = new Properties();
props.setProperty("bootstrap.servers",
"10.32.0.2:9092,10.32.0.3:9092,10.32.0.4:9092");
props.setProperty("group.id", kafkaGroup);
FlinkKafkaConsumer<String> myconsumer = new FlinkKafkaConsumer<>(
"flinkTopic", new SimpleStringSchema(), props);
DataStream<String> text = env.addSource(myconsumer).setStartFromEarliest());
I want to run Producer code at the same time that my program is running. My goal is that Producer send one record to the topic and consumer can poll that record from topic at the same time.
Would you please tell me how it is possible and how to manage it.
I think you need create two class file, one is the producer, the other is the consumer. Create topic first and then run the consumer, or run the producer directly.
Related
Kafka consumer not receiving messages produced before the consumer gets started.
public class MyKafkaConsumer {
private final KafkaConsumer<String, String> consumer;
private final String TOPIC="javaapp";
private final String BOOTSTRAP_SERVERS="localhost:9092";
private int receivedCounter=0;
private ExecutorService executorService=Executors.newFixedThreadPool(1);
private BlockingQueue<ConsumerRecords<String, String>> queue=new LinkedBlockingQueue<>(500000);
private MyKafkaConsumer() {
final Properties props=new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, BOOTSTRAP_SERVERS);
props.put(ConsumerConfig.GROUP_ID_CONFIG, "KafkaGroup6");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
consumer=new KafkaConsumer<>(props);
consumer.subscribe(Collections.singletonList(TOPIC));
}
public static void main(String[] args) throws InterruptedException {
MyKafkaConsumer perfKafkaConsumer=new MyKafkaConsumer();
perfKafkaConsumer.consumeMessage();
perfKafkaConsumer.runConsumer();
}
private void runConsumer() throws InterruptedException {
consumer.poll(Duration.ofMillis(1000));
while (true) {
final ConsumerRecords<String, String> consumerRecords=consumer.poll(Duration.ofMillis(10000));
if (!consumerRecords.isEmpty()) {
System.out.println("Adding result in queue " + queue.size());
queue.put(consumerRecords);
}
consumer.commitAsync();
}
}
private void consumeMessage() {
System.out.println("Consumer starts at " + Instant.now());
executorService.submit(() -> {
while (true) {
ConsumerRecords<String, String> poll=queue.take();
poll.forEach(record -> {
System.out.println("Received " + ++receivedCounter + " time " + Instant.now(Clock.systemUTC()));
});
}
});
}
}
ConsumerRecords are always empty
I checked the offset using Kafka tool
I have also tried with a different group name, it's not working. Same issue i.e. poll returns empty records
Although, if I start my consumer before than producer than its receiving the messages. (Kafka-client ver 2.4.1)
The auto.offset.reset consumer setting controls where a new consumer group will begin consuming from a topic. By default it is set to 'latest' which will set the consumer groups offset to the latest offset. You want to set this to 'earliest' if all consumer groups should start at the earliest offset in the topic.
I use following code to create one producer which produces around 2000 messages.
public class ProducerDemoWithCallback {
public static void main(String[] args) {
final Logger logger = LoggerFactory.getLogger(ProducerDemoWithCallback.class);
String bootstrapServers = "localhost:9092";
Properties properties = new Properties();
properties.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
properties.setProperty(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
properties.setProperty(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
// create the producer
KafkaProducer<String, String> producer = new KafkaProducer<String, String>(properties);
for (int i=0; i<2000; i++ ) {
// create a producer record
ProducerRecord<String, String> record =
new ProducerRecord<String, String>("TwitterProducer", "Hello World " + Integer.toString(i));
// send data - asynchronous
producer.send(record, new Callback() {
public void onCompletion(RecordMetadata recordMetadata, Exception e) {
// executes every time a record is successfully sent or an exception is thrown
if (e == null) {
// the record was successfully sent
logger .info("Received new metadata. \n" +
"Topic:" + recordMetadata.topic() + "\n" +
"Partition: " + recordMetadata.partition() + "\n" +
"Offset: " + recordMetadata.offset() + "\n" +
"Timestamp: " + recordMetadata.timestamp());
} else {
logger .error("Error while producing", e);
}
}
});
}
// flush data
producer.flush();
// flush and close producer
producer.close();
}
}
I want to count those messages and get int value.
I use this command and it works, but i am trying to get this count using code.
"bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list localhost:9092 --topic TwitterProducer --time -1"
and the result is
- TwitterProducer:0:2000
My code to do the same programmatically looks something like this, but I'm not sure if this is the correct way to get the count:
int valueCount = (int) recordMetadata.offset();
System.out.println("Offset value " + valueCount);
Can someone help me to get count of Kafka messages offset value using code.
You can have a look at implementation details of GetOffsetShell.
Here is a simplified code re-written in Java:
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.common.TopicPartition;
import org.apache.kafka.common.serialization.StringDeserializer;
import java.util.*;
import java.util.stream.Collectors;
public class GetOffsetCommand {
private static final Set<String> TopicNames = new HashSet<>();
static {
TopicNames.add("my-topic");
TopicNames.add("not-my-topic");
}
public static void main(String[] args) {
TopicNames.forEach(topicName -> {
final Map<TopicPartition, Long> offsets = getOffsets(topicName);
new ArrayList<>(offsets.entrySet()).forEach(System.out::println);
System.out.println(topicName + ":" + offsets.values().stream().reduce(0L, Long::sum));
});
}
private static Map<TopicPartition, Long> getOffsets(String topicName) {
final KafkaConsumer<String, String> consumer = makeKafkaConsumer();
final List<TopicPartition> partitions = listTopicPartitions(consumer, topicName);
return consumer.endOffsets(partitions);
}
private static KafkaConsumer<String, String> makeKafkaConsumer() {
final Properties props = new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
props.put(ConsumerConfig.GROUP_ID_CONFIG, "get-offset-command");
return new KafkaConsumer<>(props);
}
private static List<TopicPartition> listTopicPartitions(KafkaConsumer<String, String> consumer, String topicName) {
return consumer.listTopics().entrySet().stream()
.filter(t -> topicName.equals(t.getKey()))
.flatMap(t -> t.getValue().stream())
.map(p -> new TopicPartition(p.topic(), p.partition()))
.collect(Collectors.toList());
}
}
which produces the offset for each topic's partition and sum (total number of messages), like:
my-topic-0=184
my-topic-2=187
my-topic-4=189
my-topic-1=196
my-topic-3=243
my-topic:999
Why do you want to get that value? If you share more detail about the purpose, I can give you more good tip.
For your last question, it's not the correct way to get the count of messages with the offset value. If your topic has one partition and the producer is one, you can use it. You need to consider that the topic has several partitions.
If you want to get the number of messages from each producer, you can count it in the callback function that is onCompletion()
Or you can get the last offset using Consumer client like this:
Properties props = new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "your-brokers");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
Consumer<Long, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Collections.singletonList("topic_name");
Collection<TopicPartition> partitions = consumer.assignment();
consumer.seekToEnd(partitions);
for(TopicPartition tp: partitions) {
long offsetPosition = consumer.position(tp);
}
I am trying to take jsonSerde as input from topics and should process the record and need to put it as Avro message in different topics using kafka streams.The output looks to be binary and the data is not in actual JSON format.Looks, like it is using default bytearrayserde for value and key.I don't know why but I'm providing the serializer as SpecificAvroSerde.
private final static JsonSerde<JsonNode> jsonSerde = new JsonSerde<JsonNode>(JsonNode.class);
private static Map<String, Object> props;
//Serde of specific record
private static SpecificAvroSerde<SpecificRecord> productValueSerde;
#Bean(name = KafkaStreamsDefaultConfiguration.DEFAULT_STREAMS_CONFIG_BEAN_NAME)
public StreamsConfig kafkaStreamsConfig()
throws UnknownHostException {
props = new HashMap<>();
props.put(StreamsConfig.APPLICATION_ID_CONFIG,
"*****processor-3");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:29092,localhost:19092,localhost:39092");
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, SpecificAvroSerde.class);
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, SpecificAvroSerde.class);
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
props.put(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG,
"http://localhost:18081");
props.put(ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG, "org.apache.kafka.clients"
+ ".consumer.RoundRobinAssignor");
productValueSerde = new SpecificAvroSerde<SpecificRecord>();
productValueSerde.configure((Collections.singletonMap(
AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG,
"http://localhost:18081")),false);
return new StreamsConfig(props);
}
#Bean
public KStream<JsonNode,JsonNode> KStream(StreamsBuilder kStreamBuilder){
KStream<JsonNode,JsonNode> stream = kStreamBuilder.stream("localtest",Consumed.with(jsonSerde, jsonSerde));
try {
KStream<JsonNode,SpecificRecord> avroStream = stream.flatMap((K,V)->actNationalPaperHelper.mapToCoreAvro(K, V));
//avroStream.flatMap((K,V)->System.out.println(V); return avroStream));
avroStream.through("serdetest16",Produced.with(jsonSerde, productValueSerde));
}
catch(Exception e) {
System.out.println(e);
}
return stream;
}
T
I have a use case, where I need to read messages from kafka and for each message, extract data and invoke elasticsearch Index. The response will be further used to do further processing.
I am getting below error when invoking JavaEsSpark.esJsonRDD
java.lang.ClassCastException: org.elasticsearch.spark.rdd.EsPartition incompatible with org.apache.spark.rdd.ParallelCollectionPartition
at org.apache.spark.rdd.ParallelCollectionRDD.compute(ParallelCollectionRDD.scala:102)
My code snippet is below
public static void main(String[] args) {
if (args.length < 4) {
System.err.println("Usage: JavaKafkaIntegration <zkQuorum> <group> <topics> <numThreads>");
System.exit(1);
}
SparkConf sparkConf = new SparkConf().setAppName("JavaKafkaIntegration").setMaster("local[2]").set("spark.driver.allowMultipleContexts", "true");
//Setting when using JavaEsSpark.esJsonRDD
sparkConf.set("es.nodes",<NODE URL>);
sparkConf.set("es.nodes.wan.only","true");
context = new JavaSparkContext(sparkConf);
// Create the context with 2 seconds batch size
JavaStreamingContext jssc = new JavaStreamingContext(sparkConf, new Duration(2000));
int numThreads = Integer.parseInt(args[3]);
Map<String, Integer> topicMap = new HashMap<>();
String[] topics = args[2].split(",");
for (String topic: topics) {
topicMap.put(topic, numThreads);
}
//Receive Message From kafka
JavaPairReceiverInputDStream<String, String> messages =
KafkaUtils.createStream(jssc,args[0], args[1], topicMap);
JavaDStream<String> jsons = messages
.map(new Function<Tuple2<String, String>, String>() {
/**
*
*/
private static final long serialVersionUID = 1L;
#Override
public String call(Tuple2<String, String> tuple2){
JavaRDD<String> esRDD = JavaEsSpark.esJsonRDD(context, <index>,<search string> ).values() ;
return null;
}
});
jsons.print();
jssc.start();
jssc.awaitTermination();
}
I am getting error when invoking JavaEsSpark.esJsonRDD. Is it correct way to do it? How do I successfully invoke ES from spark?
I am running kafka and spark on windows and invoking external elastic search index.
I had written a code to fetch twitter tweets using kafka, Its working fine but it is not working for partitions. I want to create 3 partitions for one topic .. how to pass the values to partitioner class.. Any suggestions where i am doing wrong
public class kafkaSpoutFetchingRealTweets {
private String consumerKey;
private String consumerSecret;
private String accessToken;
private String accessTokenSecret;
private TwitterStream twitterStream;
/**
* #param contxt
*/
void start(final Context context) {
/** Producer properties **/
Properties props = new Properties();
props.put("metadata.broker.list",
context.getString(Constant.BROKER_LIST));
props.put("partitioner.class","SimplePartitioner");
props.put("serializer.class", context.getString(Constant.SERIALIZER));
props.put("request.required.acks",
context.getString(Constant.REQUIRED_ACKS));
props.put("producer.type", "async");
// props.put("partitioner.class", context.getClass());
ProducerConfig config = new ProducerConfig(props);
final Producer<String, String> producer = new Producer<String, String>(
config);
/** Twitter properties **/
consumerKey = context.getString(Constant.CONSUMER_KEY_KEY);
consumerSecret = context.getString(Constant.CONSUMER_SECRET_KEY);
accessToken = context.getString(Constant.ACCESS_TOKEN_KEY);
accessTokenSecret = context.getString(Constant.ACCESS_TOKEN_SECRET_KEY);
ConfigurationBuilder cb = new ConfigurationBuilder();
cb.setOAuthConsumerKey(consumerKey);
cb.setOAuthConsumerSecret(consumerSecret);
cb.setOAuthAccessToken(accessToken);
cb.setOAuthAccessTokenSecret(accessTokenSecret);
cb.setJSONStoreEnabled(true);
cb.setIncludeEntitiesEnabled(true);
twitterStream = new TwitterStreamFactory(cb.build()).getInstance();
/** Twitter listener **/
StatusListener listener = new StatusListener() {
// The onStatus method is executed every time a new tweet comes
// in.
public void onStatus(Status status) {
if(("en".equals(status.getLang())) && ("en".equals(status.getUser().getLang()))){
KeyedMessage<String, String> data = new KeyedMessage<String, String>(
context.getString(Constant.data),
DataObjectFactory.getRawJSON(status));
producer.send(data);
System.out.println(DataObjectFactory.getRawJSON(status));
}
}
}
public void onDeletionNotice(
StatusDeletionNotice statusDeletionNotice) {
}
public void onTrackLimitationNotice(int numberOfLimitedStatuses) {
}
public void onScrubGeo(long userId, long upToStatusId) {
}
public void onException(Exception ex) {
ex.printStackTrace();
logger.info("Shutting down Twitter sample stream...");
twitterStream.shutdown();
}
public void onStallWarning(StallWarning warning) {
System.out.println("stallWarning");
}
};
String[] lang = { "en" };
fq.language(lang);
twitterStream.addListener(listener);
twitterStream.sample();
}
public static void main(String[] args) {
try {
Context context = new Context(args[0]);
kafkaSpoutFetchingRealTweets tp = new kafkaSpoutFetchingRealTweets();
tp.start(context);
} catch (Exception e) {
e.printStackTrace();
logger.info(e.getMessage());
}
}
}
So there are a couple problems.
Your question and code don't match up. Your questions is asking about creating a topic with 3 partitions. But the code and example that you provided explains how to determine which partition the message should be sent to given that you've already created a topic with 3 partitions.
If you're actually wanting to create a topic with 3 partitions, you need to use the command line client. A sample can be found here, http://kafka.apache.org/documentation.html#quickstart
If you're actually wanting to just determine what partition you need to send data too. You'll need to provide more information about the actual problem you're encountering? Are they all going to the same partition? Then you need to look at how you're calculating the partition in your SimplePartitioner class that you're specifying in your config. What is in the SimplePartitioner class?
props.put("partitioner.class","SimplePartitioner");