Kafka AVRO Consumer: MySQL Decimal to Java Decimal - java

I'm trying to consume records from a MySQL table which contains 3 columns (Axis, Price, lastname) with their datatypes (int, decimal(14,4), varchar(50)) respectively.
I inserted one record which has the following data (1, 5.0000, John).
The following Java code (which consumes the AVRO records from a topic created by a MySQL Connector in Confluent platform) reads the decimal column: Price, as java.nio.HeapByteBuffer type so i can't reach the value of the column when i receive it.
Is there a way to extract or convert the received data to a Java decimal or double data type?
Here is the MySQL Connector properties file:-
{
"name": "mysql-source",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter.schema.registry.url": "http://localhost:8081",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://localhost:8081",
"incrementing.column.name": "Axis",
"tasks.max": "1",
"table.whitelist": "ticket",
"mode": "incrementing",
"topic.prefix": "mysql-",
"name": "mysql-source",
"validate.non.null": "false",
"connection.url": "jdbc:mysql://localhost:3306/ticket?
user=user&password=password"
}
}
Here is the code:-
public static void main(String[] args) throws InterruptedException,
IOException {
Properties props = new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(ConsumerConfig.GROUP_ID_CONFIG, "group1");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,
"org.apache.kafka.common.serialization.StringDeserializer");
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,
"io.confluent.kafka.serializers.KafkaAvroDeserializer");
props.put("schema.registry.url", "http://localhost:8081");
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
String topic = "sql-ticket";
final Consumer<String, GenericRecord> consumer = new KafkaConsumer<String, GenericRecord>(props);
consumer.subscribe(Arrays.asList(topic));
try {
while (true) {
ConsumerRecords<String, GenericRecord> records = consumer.poll(100);
for (ConsumerRecord<String, GenericRecord> record : records) {
System.out.printf("value = %s \n", record.value().get("Price"));
}
}
} finally {
consumer.close();
}
}

Alright so i finally found the solution.
The Heapbytebuffer needs to be converted to a byte[] array, then i used BigInteger which constructs the value from the created byte array, then i created a BigDecimal variable that takes the value of the BigInteger and i set the decimal point with movePointLeft(4) which is the Scale (in my case : 4) and everything worked as expected.
ByteBuffer buf = (ByteBuffer) record.value().get(("Price"));
byte[] arr = new byte[buf.remaining()];
buf.get(arr);
BigInteger bi =new BigInteger(1,arr);
BigDecimal bd = new BigDecimal(bi).movePointLeft(4);
System.out.println(bd);
Here are the results ( Left is the output, Right is MySQL) :-

Related

get the unprocessed message count in spring kafka

we are migrating to Kafka, I need to create a monitoring POC service that will periodically check the unprocessed message count in the Kafka queue and based on the count take some action. but this service must not read or process the message, designated consumers will do that, with every cron this service just needs the count of unprocessed messages present in the queue.
so far I have done this, from multiple examples
public void stats() throws ExecutionException, InterruptedException {
Map<String, Object> props = new HashMap<>();
// list of host:port pairs used for establishing the initial connections to the Kafka cluster
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
props.put(ConsumerConfig.GROUP_ID_CONFIG, groupId);
try (final KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props)) {
consumer.subscribe(Arrays.asList(topicName));
while (true) {
Thread.sleep(1000);
ConsumerRecords<String, String> records = consumer.poll(1000);
if (!records.isEmpty()) {
System.out.println("records is not empty = " + records.count() + " " + records);
}
for (ConsumerRecord<String, String> record : records) {
System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());
Set<TopicPartition> partitions = consumer.assignment();
//consumer.seekToBeginning(partitions);
Map<TopicPartition, Long> offsets = consumer.endOffsets(partitions);
for (TopicPartition partition : offsets.keySet()) {
OffsetAndMetadata commitOffset = consumer.committed(new TopicPartition(partition.topic(), partition.partition()));
Long lag = commitOffset == null ? offsets.get(partition) : offsets.get(partition) - commitOffset.offset();
System.out.println("lag = " + lag);
System.out.printf("partition %s is at %d\n", partition.topic(), offsets.get(partition));
}
}
}
}
}
the code is working fine some times and some times gives wrong output, please let me know
Don't subscribe to the topic; just create a consumer with the same group to get the endOffsets.
See this answer for an example.

Empty data is returned when querying using Kafka tumbling window

I'm trying to query the state store to get the data in a window of 5 mins. For that I'm using tumbling window. Have added REST to query the data.
I've stream A which consumes data from topic1 and performs some transformations and output a key value to topic2.
Now in stream B I'm doing tumbling window operation on topic2 data. When I run the code and queried using REST, I'm seeing empty data on my browser. I can see the data in the state store flowing.
What I've observed is, instead of topic2 getting data from stream A, I used a producer class to inject the data to topic2 and able to query the data from browser. But when the topic2 is getting data from stream A, I'm getting empty data.
Here is my stream A code :
public static void main(String[] args) {
final StreamsBuilder builder = new StreamsBuilder();
KStream<String, String> source = builder.stream("topic1");
KStream<String, String> output = source
.map((k,v)->
{
Map<String, Object> Fields = new LinkedHashMap<>();
Fields.put("FNAME","ABC");
Fields.put("LNAME","XYZ");
Map<String, Object> nFields = new LinkedHashMap<>();
nFields.put("ADDRESS1","HY");
nFields.put("ADDRESS2","BA");
nFields.put("addF",Fields);
Map<String, Object> eve = new LinkedHashMap<>();
eve.put("nFields", nFields);
Map<String, Object> fevent = new LinkedHashMap<>();
fevent.put("eve", eve);
LinkedHashMap<String, Object> newMap = new LinkedHashMap<>(fevent);
return new KeyValue<>("JAY1234",newMap.toString());
});
output.to("topic2");
}
Here is my stream B code (where tumbling window operation happening):
public static void main(String[] args) {
final StreamsBuilder builder = new StreamsBuilder();
KStream<String, String> eventStream = builder.stream("topic2");
eventStream.groupByKey()
.windowedBy(TimeWindows.of(300000))
.reduce((v1, v2) -> v1 + ";" + v2, Materialized.as("TumblingWindowPoc"));
final Topology topology = builder.build();
KafkaStreams streams = new KafkaStreams(topology, props);
streams.start();
}
REST code :
#GET()
#Path("/{storeName}/{key}")
#Produces(MediaType.APPLICATION_JSON)
public List<KeyValue<String, String>> windowedByKey(#PathParam("storeName") final String storeName,
#PathParam("key") final String key) {
final ReadOnlyWindowStore<String, String> store = streams.store(storeName,
QueryableStoreTypes.<String, String>windowStore());
if (store == null) {
throw new NotFoundException(); }
long timeTo = System.currentTimeMillis();
long timeFrom = timeTo - 30000;
final WindowStoreIterator<String> results = store.fetch(key, timeFrom, timeTo);
final List<KeyValue<String,String>> windowResults = new ArrayList<>();
while (results.hasNext()) {
final KeyValue<Long, String> next = results.next();
windowResults.add(new KeyValue<String,String>(key + "#" + next.key, next.value));
}
return windowResults;
}
And this is how my key value data looks like :
JAY1234 {eve = {nFields = {ADDRESS1 = HY,ADDRESS2 = BA,Fields = {FNAME = ABC,LNAME = XYZ,}}}}
I should be able to get the data when querying using REST. Any help is greatly appreciated.
Thanks!
to fetch the window timeFrom should be before window start. So if you want the data for last 30 seconds, you can substract window duration for fetching, like timeTo - 30000 - 300000, and then filter out events required events from whole window data

Retrieve last n messages of Kafka consumer from a particular topic

kafka version : 0.9.0.1
If n = 20,
I have to get last 20 messages of a topic.
I tried with
kafkaConsumer.seekToBeginning();
But it retrieves all the messages. I need to get only the last 20 messages.
This topic may have hundreds of thousands of records
public List<JSONObject> consumeMessages(String kafkaTopicName) {
KafkaConsumer<String, String> kafkaConsumer = null;
boolean flag = true;
List<JSONObject> messagesFromKafka = new ArrayList<>();
int recordCount = 0;
int i = 0;
int maxMessagesToReturn = 20;
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "project.group.id");
props.put("max.partition.fetch.bytes", "1048576000");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
kafkaConsumer = new KafkaConsumer<>(props);
kafkaConsumer.subscribe(Arrays.asList(kafkaTopicName));
TopicPartition topicPartition = new TopicPartition(kafkaTopicName, 0);
LOGGER.info("Subscribed to topic " + kafkaConsumer.listTopics());
while (flag) {
// will consume all the messages and store in records
ConsumerRecords<String, String> records = kafkaConsumer.poll(1000);
kafkaConsumer.seekToBeginning(topicPartition);
// getting total records count
recordCount = records.count();
LOGGER.info("recordCount " + recordCount);
for (ConsumerRecord<String, String> record : records) {
if(record.value() != null) {
if (i >= recordCount - maxMessagesToReturn) {
// adding last 20 messages to messagesFromKafka
LOGGER.info("kafkaMessage "+record.value());
messagesFromKafka.add(new JSONObject(record.value()));
}
i++;
}
}
if (recordCount > 0) {
flag = false;
}
}
kafkaConsumer.close();
return messagesFromKafka;
}
You can use kafkaConsumer.seekToEnd(Collection<TopicPartition> partitions) to seek to the last offset of the given partition(s). As per the documentation:
"Seek to the last offset for each of the given partitions. This function evaluates lazily, seeking to the final offset in all partitions only when poll(Duration) or position(TopicPartition) are called. If no partitions are provided, seek to the final offset for all of the currently assigned partitions."
Then you can retrieve the position of a particular partition using position(TopicPartition partition).
Then you can reduce 20 from it, and use kafkaConsumer.seek(TopicPartition partition, long offset) to get to the most recent 20 messages.
Simply,
kafkaConsumer.seekToEnd(partitionList);
long endPosition = kafkaConsumer.position(topicPartiton);
long recentMessagesStartPosition = endPosition - maxMessagesToReturn;
kafkaConsumer.seek(topicPartition, recentMessagesStartPosition);
Now you can retrieve the most recent 20 messages using poll()
This is the simple logic, but if you have multiple partitions, you have to consider those cases as well. I did not try this, but hope you'll get the concept.

KStream to KTable Left Join Returns Null

I am currently trying to use a KStream to KTable join to perform enrichment of a Kafka topic. For my proof of concept I currently have a Kafka Stream with about 600,000 records which all have the same key and a KTable created from a topic with 1 record of a key, value pair where the key in the KTable topic matches the key of the 600,000 records in the topic the KStream is created from.
When I use a left join (via the code below), all of the records return NULL on the ValueJoiner.
Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "streams-pipe-json-parse-" + System.currentTimeMillis());
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "xxx.xx.xx.xxx:9092");
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());
props.put(StreamsConfig.TIMESTAMP_EXTRACTOR_CLASS_CONFIG, "org.apache.kafka.streams.processor.WallclockTimestampExtractor");
props.put(StreamsConfig.NUM_STREAM_THREADS_CONFIG, 5);
final StreamsBuilder builder = new StreamsBuilder();
// Build a Kafka Stream from the Netcool Input Topic
KStream<String, String> source = builder.stream("output-100k");
// Join the KStream to the KTable
KStream<String, String> enriched_output = source
.leftJoin(netcool_enrichment, (orig_msg, description) -> {
String new_msg = jsonEnricher(orig_msg, description);
if (description != null) {
System.out.println("\n[DEBUG] Enriched Input Orig: " + orig_msg);
System.out.println("[DEBUG] Enriched Input Desc: " + description);
System.out.println("[DEBUG] Enriched Output: " + new_msg);
}
return new_msg;
});
Here is a sample output record (using a forEach loop) from the source KStream:
[KSTREAM] Key: ismlogs
[KSTREAM] Value: {"severity":"debug","ingested_timestamp":"2018-07-18T19:32:47.227Z","#timestamp":"2018-06-28T23:36:31.000Z","offset":482,"#metadata":{"beat":"filebeat","topic":"input-100k","type":"doc","version":"6.2.2"},"beat":{"hostname":"abc.dec.com","name":"abc.dec.com","version":"6.2.2"},"source":"/root/100k-raw.txt","message":"Thu Jun 28 23:36:31 2018 Debug: Checking status of file /ism/profiles/active/test.xml","key":"ismlogs","tags":["ismlogs"]}
I have tried converting the KTable back to a KStream and used a forEach loop over the converted Stream and I verify the records are actually there in the KTable.
KTable<String, String> enrichment = builder.table("enrichment");
KStream<String, String> ktable_debug = enrichment.toStream();
ktable_debug.foreach(new ForeachAction<String, String>() {
public void apply(String key, String value) {
System.out.println("[KTABLE] Key: " + key);
System.out.println("[KTABLE] Value: " + value);
}
});
The code above outputs:
[KTABLE] Key: "ismlogs"
[KTABLE] Value: "ISM Logs"
According to your console messages, the keys are different, and therefore they won't join :
[KSTREAM] Key: ismlogs
[KTABLE] Key: "ismlogs"
In the case of the KTable, the key is actually "ismlogs" with the double-quotes.

How to write Kafka Consumer Client in java to consume the messages from multiple brokers?

I was looking for java client (Kafka Consumer) to consume the messages from multiple brokers. please advice
Below is the code written to publish the messages to multiple brokers using simple partitioner.
Topic is created with replication factor "2" and partition "3".
public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster)
{
List<PartitionInfo> partitions = cluster.partitionsForTopic(topic);
int numPartitions = partitions.size();
logger.info("Number of Partitions " + numPartitions);
if (keyBytes == null)
{
int nextValue = counter.getAndIncrement();
List<PartitionInfo> availablePartitions = cluster.availablePartitionsForTopic(topic);
if (availablePartitions.size() > 0)
{
int part = toPositive(nextValue) % availablePartitions.size();
int selectedPartition = availablePartitions.get(part).partition();
logger.info("Selected partition is " + selectedPartition);
return selectedPartition;
}
else
{
// no partitions are available, give a non-available partition
return toPositive(nextValue) % numPartitions;
}
}
else
{
// hash the keyBytes to choose a partition
return toPositive(Utils.murmur2(keyBytes)) % numPartitions;
}
}
public void publishMessage(String message , String topic)
{
Producer<String, String> producer = null;
try
{
producer = new KafkaProducer<>(producerConfigs());
logger.info("Topic to publish the message --" + this.topic);
for(int i =0 ; i < 10 ; i++)
{
producer.send(new ProducerRecord<String, String>(this.topic, message));
logger.info("Message Published Successfully");
}
}
catch(Exception e)
{
logger.error("Exception Occured " + e.getMessage()) ;
}
finally
{
producer.close();
}
}
public Map<String, Object> producerConfigs()
{
loadPropertyFile();
Map<String, Object> propsMap = new HashMap<>();
propsMap.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, brokerList);
propsMap.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
propsMap.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
propsMap.put(ProducerConfig.PARTITIONER_CLASS_CONFIG, SimplePartitioner.class);
propsMap.put(ProducerConfig.ACKS_CONFIG, "1");
return propsMap;
}
public Map<String, Object> consumerConfigs() {
Map<String, Object> propsMap = new HashMap<>();
System.out.println("properties.getBootstrap()" + properties.getBootstrap());
propsMap.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, properties.getBootstrap());
propsMap.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
propsMap.put(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG, properties.getAutocommit());
propsMap.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, properties.getTimeout());
propsMap.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
propsMap.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
propsMap.put(ConsumerConfig.GROUP_ID_CONFIG, properties.getGroupid());
propsMap.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, properties.getAutooffset());
return propsMap;
}
#KafkaListener(id = "ID1", topics = "${config.topic}", group = "${config.groupid}")
public void listen(ConsumerRecord<?, ?> record)
{
logger.info("Message Consumed " + record);
logger.info("Partition From which Record is Received " + record.partition());
this.message = record.value().toString();
}
bootstrap.servers = [localhost:9092, localhost:9093, localhost:9094]
If you use a regular Java consumer, it will automatically read from multiple brokers. There is no special code you need to write. Just subscribe to the topic(s) you want to consumer and the consumer will connect to the corresponding brokers automatically. You only provide a "single entry point" broker -- the client figures out all other broker of the cluster automatically.
Number of Kafka broker nodes in cluster has nothing to do with consumer logic. Nodes in cluster only used for fault tolerance and bootstrap process. You placing messaging in different partitions of topic based on some custom logic it also not going to effect consumer logic. Even If you have single consumer than that consumer will consume messages from all partitions of Topic subscribed. I request you to check your code with Kafka cluster with single broker node...

Categories