Use-case - There is a topic with messages (null, Metadata). I need to create a Ktable from the topic with the key (metadata.entity_id) and value as metatdata. This table will be later used to do a join with a stream with the same key.
private final static String KAFKA_BROKERS = "localhost:9092";
private final static String APPLICATION_ID = "TestMetadataTable";
private final static String AUTO_OFFSET_RESET_CONFIG = "earliest";
private final static String METADATA_TOPIC = "test-metadata-topic";
public static void main (String args[]) {
//Setting the Stream configuration params.
final Properties kafkaStreamConfiguration = new Properties();
kafkaStreamConfiguration.put(StreamsConfig.APPLICATION_ID_CONFIG, APPLICATION_ID);
kafkaStreamConfiguration.put(StreamsConfig.CLIENT_ID_CONFIG, APPLICATION_ID);
kafkaStreamConfiguration.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, AUTO_OFFSET_RESET_CONFIG);
kafkaStreamConfiguration.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, KAFKA_BROKERS);
//Creating Serdes for MetricMetadata
GenericJsonSerializer<MetricMetadata> metadataJsonSerializer = new GenericJsonSerializer<MetricMetadata>();
GenericJsonDeserializer<MetricMetadata> metadataJsonDeserializer = new GenericJsonDeserializer<MetricMetadata>(MetricMetadata.class);
Serde<MetricMetadata> metadataSerde = Serdes.serdeFrom(metadataJsonSerializer, metadataJsonDeserializer);
//Creating kafka stream.
final StreamsBuilder builder = new StreamsBuilder();
KTable<String, MetricMetadata> metaTable = builder.table(METADATA_TOPIC, Consumed.with(Serdes.String(), metadataSerde))
.groupBy((key, value) -> KeyValue.pair(value.getEntity_id(), value))
.aggregate( () -> null,
(key, value, aggValue) -> value,
(key, value, aggValue) -> value
);
final KafkaStreams streams = new KafkaStreams(builder.build(), kafkaStreamConfiguration);
streams.start();
}
Once I pushes a message to the topic - METADATA_TOPIC. This results in the below error. Am I missing something here. kafka-streams 2.2.0
Exception in thread "TestMetadataTable-StreamThread-1" org.apache.kafka.streams.errors.ProcessorStateException: task [0_0] Failed to flush state store test-metadata-topic-STATE-STORE-0000000000
at org.apache.kafka.streams.processor.internals.ProcessorStateManager.flush(ProcessorStateManager.java:242)
at org.apache.kafka.streams.processor.internals.AbstractTask.flushState(AbstractTask.java:204)
at org.apache.kafka.streams.processor.internals.StreamTask.flushState(StreamTask.java:519)
at org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:471)
at org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:459)
at org.apache.kafka.streams.processor.internals.AssignedTasks.commit(AssignedTasks.java:286)
at org.apache.kafka.streams.processor.internals.TaskManager.commitAll(TaskManager.java:412)
at org.apache.kafka.streams.processor.internals.StreamThread.maybeCommit(StreamThread.java:1057)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:911)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:805)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:774)
Caused by: org.apache.kafka.streams.errors.StreamsException: A serializer (key: org.apache.kafka.common.serialization.ByteArraySerializer / value: org.apache.kafka.streams.kstream.internals.ChangedSerializer) is not compatible to the actual key or value type (key type: java.lang.String / value type: org.apache.kafka.streams.kstream.internals.Change). Change the default Serdes in StreamConfig or provide correct Serdes via method parameters.
at org.apache.kafka.streams.processor.internals.SinkNode.process(SinkNode.java:94)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:183)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:162)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:122)
at org.apache.kafka.streams.kstream.internals.KTableRepartitionMap$KTableMapProcessor.process(KTableRepartitionMap.java:95)
at org.apache.kafka.streams.kstream.internals.KTableRepartitionMap$KTableMapProcessor.process(KTableRepartitionMap.java:72)
at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:117)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:183)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:162)
at org.apache.kafka.streams.kstream.internals.ForwardingCacheFlushListener.apply(ForwardingCacheFlushListener.java:42)
at org.apache.kafka.streams.state.internals.CachingKeyValueStore.putAndMaybeForward(CachingKeyValueStore.java:102)
at org.apache.kafka.streams.state.internals.CachingKeyValueStore.lambda$initInternal$0(CachingKeyValueStore.java:79)
at org.apache.kafka.streams.state.internals.NamedCache.flush(NamedCache.java:141)
at org.apache.kafka.streams.state.internals.NamedCache.flush(NamedCache.java:99)
at org.apache.kafka.streams.state.internals.ThreadCache.flush(ThreadCache.java:124)
at org.apache.kafka.streams.state.internals.CachingKeyValueStore.flush(CachingKeyValueStore.java:127)
at org.apache.kafka.streams.state.internals.WrappedStateStore.flush(WrappedStateStore.java:72)
at org.apache.kafka.streams.state.internals.MeteredKeyValueStore.flush(MeteredKeyValueStore.java:224)
at org.apache.kafka.streams.processor.internals.ProcessorStateManager.flush(ProcessorStateManager.java:239)
... 10 more
Caused by: java.lang.ClassCastException: class java.lang.String cannot be cast to class [B (java.lang.String and [B are in module java.base of loader 'bootstrap')
at org.apache.kafka.common.serialization.ByteArraySerializer.serialize(ByteArraySerializer.java:21)
at org.apache.kafka.common.serialization.Serializer.serialize(Serializer.java:60)
at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.send(RecordCollectorImpl.java:161)
at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.send(RecordCollectorImpl.java:102)
at org.apache.kafka.streams.processor.internals.SinkNode.process(SinkNode.java:89)
... 28 more
In this case, you need to provide Serdes to the KTable.groupBy() operation via Grouped as calling groupBy triggers a repartition. You'll also need to provide the same Serdes to the aggregate operation for the state store.
Also, since the key is null, I think you should use a KStream initially. Then call groupByKey (you still need to provide Serdes via Grouped), and aggregation will give you the KTable you want.
Off the top of my head, something like this should work
builder.stream((METADATA_TOPIC, Consumed.with(Serdes.String(), metadataSerde))
.selectKey((key, value) -> KeyValue.pair(value.getEntity_id(), value))
.groupByKey(Grouped.with(Serdes.String(), metadataSerde))
.aggregate( () -> null,
(key, value, aggValue) -> value,
(key, value, aggValue) -> value,
Materialized.with(Serdes.String(), metadataSerde)
);
Related
I would like to create a KTable<String, List>. When app receive a record Record<String, TimePeriod> it just add this timePeriod to corresponding list if key exists in other case creates new list with this timePeriod.
KTable<String, List<TimePeriod>> kTable = streamsBuilder(topologyConfig.exemptionsTopic(), Consumed.with(Serdes.String(), new TimePeriodSerde())
.groupByKey()
.aggregate(
ArrayList::new, /*Initializer*/
(key, timePeriod, timePeriodList) -> { /* aggregator */
timePeriodList.add(timePeriod);
return timePeriodList;
}, Materialized.<String,List<TimePeriod>, KeyValueStore<Bytes, byte[]>>as("storeName")
.withValueSerde(Serdes.ListSerde(ArrayList.class, new TimePeriodSerde())));
I'm not pretty sure that this is proper code)
And later I would like to join stream <String, Product> with this kTable. Assume that key in kStream and in kTable is productName.
I would like to join product with time period if product.getProductionDate() lie in any of this timePeriods. So something like this...
ValueJoiner<Product, List<TimePeriod>, JoinedProduct> joiner = (product, periodList) -> {
for(TimePeriod period : periodList) {
if(product.isInRange(period) {
return new JoinedProduct(product, period);
}
}
return new JoinedProduct(product, null);
};
streamsBuilder.stream(topologyConfig.productsTopic(), Consumed.with(Serdes.String(), new ProductSerde()))
.leftJoin(kTable, joiner)
.to(...);
when I try to execute this there is an error during building the project.
but intellij idea nothing underlines me.
I am lean more that there is mistake in creating a kTable
I'm attempting to reproduce this example. My topology is:
#Bean("myTopo")
public KStream<Object, Object> getTopo(#Qualifier("myKConfig") StreamsBuilder builder) {
var stream = builder.stream("my-events");
stream.groupByKey()
.windowedBy(TimeWindows.of(Duration.ofMinutes(2)))
.count()
.suppress(Suppressed.untilWindowCloses(Suppressed.BufferConfig.unbounded()))
.toStream()
.foreach((k, v) -> {
System.out.println("k + v = " + k + " --- " + v);
});
I've set the serde and the windowed serde internal classes in the config:
...
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, JsonSerde.class);
...
props.put(JsonDeserializer.VALUE_DEFAULT_TYPE, JsonNode.class);
props.put(StreamsConfig.DEFAULT_WINDOWED_KEY_SERDE_INNER_CLASS, Serdes.String().getClass());
props.put(StreamsConfig.DEFAULT_WINDOWED_VALUE_SERDE_INNER_CLASS, JsonSerde.class);
var config = new KafkaStreamsConfiguration(props);
return new StreamsBuilderFactoryBean(config);
The error I get is
org.apache.kafka.streams.errors.StreamsException: ClassCastException invoking Processor.
Do the Processor's input types match the deserialized types?
Check the Serde setup and change the default Serdes in StreamConfig or provide correct Serdes via method parameters.
Make sure the Processor can accept the deserialized input of type key:
org.apache.kafka.streams.kstream.Windowed,
and value:
org.apache.kafka.streams.kstream.internals.Change.
with underlying cause
java.lang.ClassCastException: class org.apache.kafka.streams.kstream.Windowed
cannot be cast to class java.lang.String (org.apache.kafka.streams.kstream.Windowed is in unnamed module of loader 'app';
java.lang.String is in module java.base of loader 'bootstrap')
I see count() returns KTable<Windowed<Object>, Long>. So it looks like the problem is it wants a Windowed<String> serde for the key. Apparently, DEFAULT_WINDOWED_KEY_SERDE_INNER_CLASS isn't sufficient.
How do I create and set this?
I think I ran into this bug:
https://issues.apache.org/jira/browse/KAFKA-9259
I added a Materialized to the count() method
var store = Stores.persistentTimestampedWindowStore(
"some-state-store",
Duration.ofMinutes(5),
Duration.ofMinutes(2),
false);
var materialized = Materialized
.<String, Long>as(store)
.withKeySerde(Serdes.String());
Now the code runs without exception.
I have two DataStreams, the first one called DataStream<String> source which receive records from a message broker, and the second one is a SingleOutputOperator<Event> events, which is the result of mapping the source into Event.class.
I have a uses cases that needs to use SingleOutputOperator<Event> events and other that uses DataStream<String> source. In one of the use cases that use DataStream<String> source, I need to join the SingleOutputOperator<String> result after apply some filters and to avoid to map the source again into Event.class as I already have that operation done and that Stream, I need to search each record into the SingleOutputOperator<String> result into the SingleOutputOperator<Event> events and the apply another map to export a SingleOutputOperator<EventOutDto> out.
This is the idea as example:
DataStream<String> source = env.readFrom(source);
SingleOutputOperator<Event> events = source.map(s -> mapper.readValue(s, Event.class));
public void filterAndJoin(DataStream<String> source, SingleOutputOperator<Event> events){
SingleOutputOperator<String> filtered = source.filter(s -> new FilterFunction());
SingleOutputOperator<EventOutDto> result = (this will be the result of search each record
based on id in the filtered stream into the events stream where the id must match and return the event if found)
.map(event -> new EventOutDto(event)).addSink(new RichSinkFunction());
}
I have this code:
filtered.join(events)
.where(k -> {
JsonNode tree = mapper.readTree(k);
String id = "";
if (tree.get("Id") != null) {
id = tree.get("Id").asText();
}
return id;
})
.equalTo(e -> {
return e.Id;
})
.window(TumblingEventTimeWindows.of(Time.seconds(1)))
.apply(new JoinFunction<String, Event, BehSingleEventTriggerDTO>() {
#Override
public EventOutDto join(String s, Event event) throws Exception {
return new EventOutDto(event);
}
})
.addSink(new SinkFunction());
In the above code all works fine, the ids are the same, so basically the where(id).equalTo(id) should work, but the process never reaches the apply function.
Observation: Watermark are assigned with the same timestamp
Questions:
Any idea why?
Am I explained myself fine?
I solved the join by doing this:
SingleOutputStreamOperator<ObjectDTO> triggers = candidates
.keyBy(new KeySelector())
.intervalJoin(keyedStream.keyBy(e -> e.Id))
.between(Time.milliseconds(-2), Time.milliseconds(1))
.process(new new ProcessFunctionOne())
.keyBy(k -> k.otherId)
.process(new ProcessFunctionTwo());
I really need help!
I can't extract the timestamp for a message sent by a producer. In my project I work with Json, I have a class in which I define the keys and one in which I define the values of the message that I will send via a producer on a "Raw" topic. I have 2 other classes that do the same thing for the output message that my consumer will read on the topic called "Tdt". In the main class KafkaStreams.java I define the stream and map the keys and values. Starting Kafka locally, I start a producer who writes a message on the "raw" topic with keys and values, then on another shell the consumer starts reading the exit message on the "tdt" topic. How do I get the event timestamp? I need to know the timestamp in which the message was sent by the producer. Do I need a TimestampExtractor?
Here is my main class kafkastreams (my application works great, I just need the timestamp)
#Bean("app1StreamTopology")
public KStream<LibAssIbanRawKey, LibAssIbanRawValue> kStream() throws ParseException {
JsonSerde<Dwsitspr4JoinValue> Dwsitspr4JoinValueSerde = new JsonSerde<>(Dwsitspr4JoinValue.class);
KStream<LibAssIbanRawKey, LibAssIbanRawValue> stream = defaultKafkaStreamsBuilder.stream(inputTopic);
stream.peek((k,v) -> logger.info("Debug3 Chiave descrizione -> ({})",v.getCATRAPP()));
GlobalKTable<Integer, Dwsitspr4JoinValue> categoriaRapporto = defaultKafkaStreamsBuilder
.globalTable(temptiptopicname,
Consumed.with(Serdes.Integer(), Dwsitspr4JoinValueSerde)
// .withOffsetResetPolicy(Topology.AutoOffsetReset.EARLIEST)
);
logger.info("Debug3 Chiave descrizione -> ({})",categoriaRapporto.toString()) ;
stream.peek((k,v) -> logger.info("Debug4 Chiave descrizione -> ({})",v.getCATRAPP()) );
stream
.join(categoriaRapporto, (k, v) -> v.getCATRAPP(), (valueStream, valueGlobalKtable) -> {
// Value mapping
LibAssIbanTdtValue newValue = new LibAssIbanTdtValue();
newValue.setDescrizioneRidottaCodiceCategoriaDelRapporto(valueGlobalKtable.getDescrizioneRidotta());
newValue.setDescrizioneEstesaCodiceCategoriaDelRapporto(valueGlobalKtable.getDescrizioneEstesa());
newValue.setIdentificativo(valueStream.getAUD_CCID());
.
.
.//Other Value Mapped
.
.
.map((key, value) -> {
// Key mapping
LibAssIbanTdtKey newKey = new LibAssIbanTdtKey();
newKey.setData(dtf.format(localDate));
newKey.setIdentificatoreUnivocoDellaRigaDiTabella(key.getTABROWID());
return KeyValue.pair(newKey, value);
}).to(outputTopic, Produced.with(new JsonSerde<>(LibAssIbanTdtKey.class), new JsonSerde<>(LibAssIbanTdtValue.class)));
return stream;
}
}
Yes you need a TimestampExtractor.
public class YourTimestampExtractor implements TimestampExtractor {
#Override
public long extract(ConsumerRecord<Object, Object> consumerRecord, long l) {
// do whatever you want with the timestamp available with consumerRecord.timestamp()
...
// return here the timestamp you want to use (here default)
return consumerRecord.timestamp();
}
}
You'll need to tell kafka stream what extractor to use under the key StreamsConfig.DEFAULT_TIMESTAMP_EXTRACTOR_CLASS_CONFIG
I have a Kafka stream that receives records and I want to concatenate messages based on particular field.
A message in a stream looks like following:
Key: 2099
Payload{
email: tom#emample.com
eventCode: 2099
}
Expected Output:
key: 2099
Payload{
emails: tom#example, bill#acme.com, jane#example.com
}
I can get the stream to run fine, I just am not sure what the lamda should contain.
This is what I have done so far. I am not sure whether I should use map, aggregate or reduce or combination of those operations.
final StreamsBuilder builder = new StreamsBuilder();
KStream<String, Payload> inputStream = builder.stream(INPUT_TOPIC);
inputStream
.groupByKey()
.windowedBy(TimeWindows.of(TimeUnit.MINUTES.toMillis(300000)))
// Not sure what to do here …..
}).to (OUTPUT_TOPIC );
It could be something like this
inputStream.groupByKey().windowedBy(TimeWindows.of(TimeUnit.MINUTES.toMillis(300000)))
.aggregate(PayloadAggr::new, new Aggregator<String, Payload, PayloadAggr>() {
#Override
public PayloadAggr apply(String key, Payload newValue, PayloadAggr result) {
result.setKey(key);
if(result.getEmails()==null){
result.setEmails(newValue.getEmail());
}else{
result.setEmails(result.getEmails() + "," + newValue.getEmail());
}
return result;
}
}, .../* You serdes and store */}).toStream().to(OUTPUT_TOPIC);