KStream to KTable Left Join Returns Null - java

I am currently trying to use a KStream to KTable join to perform enrichment of a Kafka topic. For my proof of concept I currently have a Kafka Stream with about 600,000 records which all have the same key and a KTable created from a topic with 1 record of a key, value pair where the key in the KTable topic matches the key of the 600,000 records in the topic the KStream is created from.
When I use a left join (via the code below), all of the records return NULL on the ValueJoiner.
Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "streams-pipe-json-parse-" + System.currentTimeMillis());
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "xxx.xx.xx.xxx:9092");
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());
props.put(StreamsConfig.TIMESTAMP_EXTRACTOR_CLASS_CONFIG, "org.apache.kafka.streams.processor.WallclockTimestampExtractor");
props.put(StreamsConfig.NUM_STREAM_THREADS_CONFIG, 5);
final StreamsBuilder builder = new StreamsBuilder();
// Build a Kafka Stream from the Netcool Input Topic
KStream<String, String> source = builder.stream("output-100k");
// Join the KStream to the KTable
KStream<String, String> enriched_output = source
.leftJoin(netcool_enrichment, (orig_msg, description) -> {
String new_msg = jsonEnricher(orig_msg, description);
if (description != null) {
System.out.println("\n[DEBUG] Enriched Input Orig: " + orig_msg);
System.out.println("[DEBUG] Enriched Input Desc: " + description);
System.out.println("[DEBUG] Enriched Output: " + new_msg);
}
return new_msg;
});
Here is a sample output record (using a forEach loop) from the source KStream:
[KSTREAM] Key: ismlogs
[KSTREAM] Value: {"severity":"debug","ingested_timestamp":"2018-07-18T19:32:47.227Z","#timestamp":"2018-06-28T23:36:31.000Z","offset":482,"#metadata":{"beat":"filebeat","topic":"input-100k","type":"doc","version":"6.2.2"},"beat":{"hostname":"abc.dec.com","name":"abc.dec.com","version":"6.2.2"},"source":"/root/100k-raw.txt","message":"Thu Jun 28 23:36:31 2018 Debug: Checking status of file /ism/profiles/active/test.xml","key":"ismlogs","tags":["ismlogs"]}
I have tried converting the KTable back to a KStream and used a forEach loop over the converted Stream and I verify the records are actually there in the KTable.
KTable<String, String> enrichment = builder.table("enrichment");
KStream<String, String> ktable_debug = enrichment.toStream();
ktable_debug.foreach(new ForeachAction<String, String>() {
public void apply(String key, String value) {
System.out.println("[KTABLE] Key: " + key);
System.out.println("[KTABLE] Value: " + value);
}
});
The code above outputs:
[KTABLE] Key: "ismlogs"
[KTABLE] Value: "ISM Logs"

According to your console messages, the keys are different, and therefore they won't join :
[KSTREAM] Key: ismlogs
[KTABLE] Key: "ismlogs"
In the case of the KTable, the key is actually "ismlogs" with the double-quotes.

Related

get the unprocessed message count in spring kafka

we are migrating to Kafka, I need to create a monitoring POC service that will periodically check the unprocessed message count in the Kafka queue and based on the count take some action. but this service must not read or process the message, designated consumers will do that, with every cron this service just needs the count of unprocessed messages present in the queue.
so far I have done this, from multiple examples
public void stats() throws ExecutionException, InterruptedException {
Map<String, Object> props = new HashMap<>();
// list of host:port pairs used for establishing the initial connections to the Kafka cluster
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
props.put(ConsumerConfig.GROUP_ID_CONFIG, groupId);
try (final KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props)) {
consumer.subscribe(Arrays.asList(topicName));
while (true) {
Thread.sleep(1000);
ConsumerRecords<String, String> records = consumer.poll(1000);
if (!records.isEmpty()) {
System.out.println("records is not empty = " + records.count() + " " + records);
}
for (ConsumerRecord<String, String> record : records) {
System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());
Set<TopicPartition> partitions = consumer.assignment();
//consumer.seekToBeginning(partitions);
Map<TopicPartition, Long> offsets = consumer.endOffsets(partitions);
for (TopicPartition partition : offsets.keySet()) {
OffsetAndMetadata commitOffset = consumer.committed(new TopicPartition(partition.topic(), partition.partition()));
Long lag = commitOffset == null ? offsets.get(partition) : offsets.get(partition) - commitOffset.offset();
System.out.println("lag = " + lag);
System.out.printf("partition %s is at %d\n", partition.topic(), offsets.get(partition));
}
}
}
}
}
the code is working fine some times and some times gives wrong output, please let me know
Don't subscribe to the topic; just create a consumer with the same group to get the endOffsets.
See this answer for an example.

Empty data is returned when querying using Kafka tumbling window

I'm trying to query the state store to get the data in a window of 5 mins. For that I'm using tumbling window. Have added REST to query the data.
I've stream A which consumes data from topic1 and performs some transformations and output a key value to topic2.
Now in stream B I'm doing tumbling window operation on topic2 data. When I run the code and queried using REST, I'm seeing empty data on my browser. I can see the data in the state store flowing.
What I've observed is, instead of topic2 getting data from stream A, I used a producer class to inject the data to topic2 and able to query the data from browser. But when the topic2 is getting data from stream A, I'm getting empty data.
Here is my stream A code :
public static void main(String[] args) {
final StreamsBuilder builder = new StreamsBuilder();
KStream<String, String> source = builder.stream("topic1");
KStream<String, String> output = source
.map((k,v)->
{
Map<String, Object> Fields = new LinkedHashMap<>();
Fields.put("FNAME","ABC");
Fields.put("LNAME","XYZ");
Map<String, Object> nFields = new LinkedHashMap<>();
nFields.put("ADDRESS1","HY");
nFields.put("ADDRESS2","BA");
nFields.put("addF",Fields);
Map<String, Object> eve = new LinkedHashMap<>();
eve.put("nFields", nFields);
Map<String, Object> fevent = new LinkedHashMap<>();
fevent.put("eve", eve);
LinkedHashMap<String, Object> newMap = new LinkedHashMap<>(fevent);
return new KeyValue<>("JAY1234",newMap.toString());
});
output.to("topic2");
}
Here is my stream B code (where tumbling window operation happening):
public static void main(String[] args) {
final StreamsBuilder builder = new StreamsBuilder();
KStream<String, String> eventStream = builder.stream("topic2");
eventStream.groupByKey()
.windowedBy(TimeWindows.of(300000))
.reduce((v1, v2) -> v1 + ";" + v2, Materialized.as("TumblingWindowPoc"));
final Topology topology = builder.build();
KafkaStreams streams = new KafkaStreams(topology, props);
streams.start();
}
REST code :
#GET()
#Path("/{storeName}/{key}")
#Produces(MediaType.APPLICATION_JSON)
public List<KeyValue<String, String>> windowedByKey(#PathParam("storeName") final String storeName,
#PathParam("key") final String key) {
final ReadOnlyWindowStore<String, String> store = streams.store(storeName,
QueryableStoreTypes.<String, String>windowStore());
if (store == null) {
throw new NotFoundException(); }
long timeTo = System.currentTimeMillis();
long timeFrom = timeTo - 30000;
final WindowStoreIterator<String> results = store.fetch(key, timeFrom, timeTo);
final List<KeyValue<String,String>> windowResults = new ArrayList<>();
while (results.hasNext()) {
final KeyValue<Long, String> next = results.next();
windowResults.add(new KeyValue<String,String>(key + "#" + next.key, next.value));
}
return windowResults;
}
And this is how my key value data looks like :
JAY1234 {eve = {nFields = {ADDRESS1 = HY,ADDRESS2 = BA,Fields = {FNAME = ABC,LNAME = XYZ,}}}}
I should be able to get the data when querying using REST. Any help is greatly appreciated.
Thanks!
to fetch the window timeFrom should be before window start. So if you want the data for last 30 seconds, you can substract window duration for fetching, like timeTo - 30000 - 300000, and then filter out events required events from whole window data

Kafka AVRO Consumer: MySQL Decimal to Java Decimal

I'm trying to consume records from a MySQL table which contains 3 columns (Axis, Price, lastname) with their datatypes (int, decimal(14,4), varchar(50)) respectively.
I inserted one record which has the following data (1, 5.0000, John).
The following Java code (which consumes the AVRO records from a topic created by a MySQL Connector in Confluent platform) reads the decimal column: Price, as java.nio.HeapByteBuffer type so i can't reach the value of the column when i receive it.
Is there a way to extract or convert the received data to a Java decimal or double data type?
Here is the MySQL Connector properties file:-
{
"name": "mysql-source",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter.schema.registry.url": "http://localhost:8081",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://localhost:8081",
"incrementing.column.name": "Axis",
"tasks.max": "1",
"table.whitelist": "ticket",
"mode": "incrementing",
"topic.prefix": "mysql-",
"name": "mysql-source",
"validate.non.null": "false",
"connection.url": "jdbc:mysql://localhost:3306/ticket?
user=user&password=password"
}
}
Here is the code:-
public static void main(String[] args) throws InterruptedException,
IOException {
Properties props = new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put(ConsumerConfig.GROUP_ID_CONFIG, "group1");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,
"org.apache.kafka.common.serialization.StringDeserializer");
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,
"io.confluent.kafka.serializers.KafkaAvroDeserializer");
props.put("schema.registry.url", "http://localhost:8081");
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
String topic = "sql-ticket";
final Consumer<String, GenericRecord> consumer = new KafkaConsumer<String, GenericRecord>(props);
consumer.subscribe(Arrays.asList(topic));
try {
while (true) {
ConsumerRecords<String, GenericRecord> records = consumer.poll(100);
for (ConsumerRecord<String, GenericRecord> record : records) {
System.out.printf("value = %s \n", record.value().get("Price"));
}
}
} finally {
consumer.close();
}
}
Alright so i finally found the solution.
The Heapbytebuffer needs to be converted to a byte[] array, then i used BigInteger which constructs the value from the created byte array, then i created a BigDecimal variable that takes the value of the BigInteger and i set the decimal point with movePointLeft(4) which is the Scale (in my case : 4) and everything worked as expected.
ByteBuffer buf = (ByteBuffer) record.value().get(("Price"));
byte[] arr = new byte[buf.remaining()];
buf.get(arr);
BigInteger bi =new BigInteger(1,arr);
BigDecimal bd = new BigDecimal(bi).movePointLeft(4);
System.out.println(bd);
Here are the results ( Left is the output, Right is MySQL) :-

How to join two Kafka streams and produce the result in a topic with Avro values

I've got two Kafka Streams with keys in String and values in Avro format which I have created using KSQL.
Here's the first one:
DESCRIBE EXTENDED STREAM_1;
Type : STREAM
Key field : IDUSER
Timestamp field : Not set - using <ROWTIME>
Key format : STRING
Value format : AVRO
Kafka output topic : STREAM_1 (partitions: 4, replication: 1)
Field | Type
--------------------------------------------------------
ROWTIME | BIGINT (system)
ROWKEY | VARCHAR(STRING) (system)
FIRSTNAME | VARCHAR(STRING)
LASTNAME | VARCHAR(STRING)
IDUSER | VARCHAR(STRING)
and the second one:
DESCRIBE EXTENDED STREAM_2;
Type : STREAM
Key field : IDUSER
Timestamp field : Not set - using <ROWTIME>
Key format : STRING
Value format : AVRO
Kafka output topic : STREAM_2 (partitions: 4, replication: 1)
Field | Type
--------------------------------------------------------
ROWTIME | BIGINT (system)
ROWKEY | VARCHAR(STRING) (system)
USERNAME | VARCHAR(STRING)
IDUSER | VARCHAR(STRING)
DEVICE | VARCHAR(STRING)
The desired output should include IDUSER, LASTNAME, DEVICE and USERNAME.
I want to left join these streams (on IDUSER) using Streams API and write the output into a kafka topic.
To do so, I've tried the following:
public static void main(String[] args) {
final Properties streamsConfiguration = new Properties();
streamsConfiguration.put(StreamsConfig.APPLICATION_ID_CONFIG, "kafka-strteams");
streamsConfiguration.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
streamsConfiguration.put(StreamsConfig.ZOOKEEPER_CONNECT_CONFIG, "localhost:2181");
streamsConfiguration.put(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, "http://localhost:8081");
streamsConfiguration.put(StreamsConfig.KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
streamsConfiguration.put(StreamsConfig.VALUE_SERDE_CLASS_CONFIG, GenericAvroSerde.class);
streamsConfiguration.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
final Serde<String> stringSerde = Serdes.String();
final Serde<GenericRecord> genericAvroSerde = new GenericAvroSerde();
boolean isKeySerde = false;
genericAvroSerde.configure(Collections.singletonMap(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, "http://localhost:8081"), isKeySerde);
KStreamBuilder builder = new KStreamBuilder();
KStream<String, GenericRecord> left = builder.stream("STREAM_1");
KStream<String, GenericRecord> right = builder.stram("STREAM_2");
// Java 8+ example, using lambda expressions
KStream<String, GenericRecord> joined = left.leftJoin(right,
(leftValue, rightValue) -> "left=" + leftValue + ", right=" + rightValue, /* ValueJoiner */
JoinWindows.of(TimeUnit.MINUTES.toMillis(5)),
Joined.with(
stringSerde, /* key */
genericAvroSerde, /* left value */
genericAvroSerde) /* right value */
);
joined.to(stringSerde, genericAvroSerde, "streams-output-testing");
KafkaStreams streams = new KafkaStreams(builder, streamsConfiguration);
streams.cleanUp();
streams.start();
Runtime.getRuntime().addShutdownHook(new Thread(streams::close));
}
However,
KStream<String, GenericRecord> joined = ...
throws an error on my IDE:
incompatible types: inference variable VR has incompatible bounds
When I try to use a String Serde for both keys and values, it works but the data is not that readable from kafka-console-consumer. What I want to do is to produce the data in AVRO format in order to be able to read them off using kafka-avro-console-consumer.
My first guess is that you are returning a String from the join operation, whereas your code expects a GenericRecord as the result:
KStream<String, GenericRecord> joined = left.leftJoin(right,
(leftValue, rightValue) -> "left=" + leftValue + ", right=" + rightValue, ...)
Note how joined has type KStream<String, GenericRecord>, i.e. the value has type GenericRecord, but the join output is computed via "left=" + leftValue + ", right=" + rightValue, which has type String.
Instead of converting value into string you can directly return value.
For example :
KStream joined = left.leftJoin(right,
(leftValue, rightValue) -> { return rightValue});

Kafka Streams: Grouping by a key in Json log

I have kafka Streams application with an input topic input on which the following records come as json logs:
JSON log:
{"CreationTime":"2018-02-12T12:32:31","UserId":"abc#gmail.com","Operation":"upload","Workload":"Drive"}
I am building a stream from the topic:
final StreamsBuilder builder = new StreamsBuilder();
KStream<String, String> source_user_activity = builder.stream("input");
Next I want to groupBy "UserId" and find count against each user.
final Serde<String> stringSerde = Serdes.String();
final Serde<Long> longSerde = Serdes.Long();
final StreamsBuilder builder = new StreamsBuilder();
KStream<String, String> source_user_activity = builder.stream("input");
final KTable<String, Long> wordCounts = source_user_activity
.flatMap((key, value) -> {
List<KeyValue<String, String>> result = new LinkedList<>();
JSONObject valueObject = new JSONObject(value);
result.add(KeyValue.pair((valueObject.get("UserId").toString()), valueObject.toString()));
return result;
})
.groupByKey()
.count();
wordCounts.toStream().to("output",Produced.with(stringSerde, longSerde));
wordCounts.print();
Next I am consuming records from output topic using console-consumer. I am not seeing any text, its just some thing like this:
However wordCounts.print() shows this:
[KSTREAM-AGGREGATE-0000000003]: abc#gmail.com, (1<-null)
What am I doing wrong here? Thanks.
The data of the value is encoded as long (you are using LongSerde for the value) and console consumer users StringDeserializer by default, and thus, it cannot correctly deserialize the value.
You need to specify LongDeserializer via a command line argument for the console consumer for the value.

Categories