I have an implementation related to KTable and using CloudEvents to produce events, but for some unknown reasons, the produced event from KTable is not formatted based on CloudEvent. The implementation is as below:
public void initKafkaStream() {
StreamsBuilder streamsBuilder = new StreamsBuilder();
PojoCloudEventDataMapper<TicketEvent> ticketEventMapper = PojoCloudEventDataMapper.from(objectMapper, TicketEvent.class);
KStream<String, CloudEvent> rawTicketStream = streamsBuilder.stream(rawTicketEvent, Consumed.with(Serdes.String(), cloudEventSerde));
rawTicketStream
.mapValues(e -> convertToPojo(e, TicketEventMapper))
.filter((k, v) -> v != null)
.groupByKey()
.aggregate(
AggregatedTicketEvent::new,
(key, val, agg) -> doAggregation(agg, val),
Materialized
.<String, AggregatedTicketEvent, KeyValueStore<Bytes, byte[]>>as("aggregatedTicket")
.withValueSerde(aggregatedTicketEventSerde)
.withLoggingDisabled()
)
.mapValues(result -> {
try {
return CloudEventBuilder.v1()
.withId(UUID.randomUUID().toString())
.withType("ticket_update")
.withSource(sourceTemplate.expand(result.getCurrent().getId()))
.withTime(result.getMeta().getOccurredAt())
.withData(objectMapper.writeValueAsBytes(result))
.withDataContentType("application/json")
.build();
} catch (JsonProcessingException e) {
throw new RuntimeException(e);
}
})
.toStream()
.to(aggregatedTicketEvent, Produced.with(Serdes.String(), cloudEventSerde));
streams = new KafkaStreams(streamsBuilder.build(streamsConfig), streamsConfig);
streams.setUncaughtExceptionHandler(ex -> StreamThreadExceptionResponse.REPLACE_THREAD);
streams.start();
}
Has anyone had such an issue?
Thanks in advance
The issue was that the configuration in props has been overwritten from Kafka streams in the serializer/deserializer, and by default sets the format to Encoding.BINARY. When the encoding is in Binary then the CloudEvents format is present only in the header instead of in the payload. To make sure that the serializers have the correct configuration I added them in the CloudEventSerializer and CloudEventDeserializer. In this case, the Serdes.serdeFrom() will be like the below:
Map<String, Object> ceSerializerConfigs = new HashMap<>();
ceSerializerConfigs.put(ENCODING_CONFIG, Encoding.STRUCTURED);
ceSerializerConfigs.put(EVENT_FORMAT_CONFIG, JsonFormat.CONTENT_TYPE);
CloudEventSerializer serializer = new CloudEventSerializer();
serializer.configure(ceSerializerConfigs, false);
CloudEventDeserializer deserializer = new CloudEventDeserializer();
deserializer.configure(ceSerializerConfigs, false);
this.cloudEventSerde = Serdes.serdeFrom(serializer, deserializer);
In order to get cloud events format in JSON payload we have to use Encoding.STRUCTURED with JSON content type, this will do the magic and have the results in the payload.
Hope this will help someone who's struggling with this issue!
Best,
Related
I'm using Apache Kafka Stream where I added a transform in my stream
final StreamsBuilder streamsBuilder = new StreamsBuilder();
final StoreBuilder<KeyValueStore<String, byte[]>> correlationStore =
Stores.keyValueStoreBuilder(
Stores.persistentKeyValueStore(STORE_NAME),
Serdes.String(),
Serdes.ByteArray());
streamsBuilder.addStateStore(correlationStore);
streamsBuilder.stream(topicName, inputConsumed)
.peek(InboundPendingMessageStreamer::logEntries)
.transform(() -> new CleanerTransformer<String, byte[], KeyValue<String, byte[]>>(Duration.ofMillis(5000), STORE_NAME), STORE_NAME)
.toTable();
I'm having difficulties to understand the CleanerTransformer Transformer class that I create, where in the init method, I set a schedule with a scanFrequency and a PunctuationType.
#Override
public void init(ProcessorContext context) {
this.stateStore = context.getStateStore(purgeStoreName);
context.schedule(scanFrequency, PunctuationType.STREAM_TIME, timestamp -> {
try (final KeyValueIterator<K, byte[]> all = stateStore.all()) {
while (all.hasNext()) {
final var headers = context.headers();
final KeyValue<K, byte[]> record = all.next();
}
}
});
}
Adding an event in the stream, I got the message in the schedule callback, but it's only executed once.
My understanding was, that it should be executed every time configured in the scanFrequency.
Any idea what I'm doing wrong here?
I would like to turn many records into one per message. I tried many things like custom reducing and aggregators, but they all still send one-to-one records back out. For example I would like to convert many strings into just one string. If my stream is messages with the same key, but different values, "the", "sky", "is", "blue", then I would like to outback back one concatenation of them in a new topic "the,sky,is,blue,". What I am instead getting is 4 messages "the,", "the, sky,", "the,sky, is,", "the,sky,is,blue,". When I send a second message to the kafka consumer, it will concatenate on the previous aggregation and I eventually receive this "the,sky,is,blue,the,sky,is,blue,"
I also tried using a custom storebuilder and changing a lot of the settings to see if that would do anything.
Map<String, String> changelogConfig = new HashMap<>();
changelogConfig.put("message.down.conversion.enable", "true");
changelogConfig.put("flush.messages", "0");
changelogConfig.put("flush.ms", "0");
StoreBuilder<KeyValueStore<String, String>> aggStoreSupplier = Stores.keyValueStoreBuilder(
Stores.persistentKeyValueStore("AggStore"),
Serdes.String(),
Serdes.String())
.withLoggingEnabled(changelogConfig);
KStream<String, String> results = source // single message get processed and eventually i get these string results I need to concatenate
.groupByKey() // this kgroupedstream has the N records, which was how many were sent in the message
.reduce(new Reducer<String>() {
#Override
public String apply(String aggValue, String value) {
return value + "," + aggValue;
}
}, Materialized.as("AggStore"))
.toStream();
results.to("results", Produced.with(Serdes.String(), Serdes.String()));
final Topology topology = builder.build(); // to describe topology
System.out.println(topology.describe()); // to print description
final KafkaStreams streams = new KafkaStreams(topology, props);
final CountDownLatch latch = new CountDownLatch(1);
// attach shutdown handler to catch control-c
Runtime.getRuntime().addShutdownHook(new Thread("streams-shutdown-hook") {
#Override
public void run() {
streams.close();
latch.countDown();
}
});
try {
streams.cleanUp();
streams.start();
latch.await();
} catch (Throwable e) {
System.exit(1);
}
System.exit(0);
I need to configure retention policy of a particular topic during creation. I tried to look for solution i could only find command level alter command as below
./bin/kafka-topics.sh --zookeeper localhost:2181 --alter --topic my-topic --config retention.ms=1680000
Can someone let me know a way to configure it during creation, something like xml or properties file configuration in spring-mvc.
Spring Kafka lets you create new topics by declaring #Beans in your application context. This will require a bean of type KafkaAdmin in the application context, which will be created automatically if using Spring Boot. You could define your topic as follows:
#Bean
public NewTopic myTopic() {
return TopicBuilder.name("my-topic")
.partitions(4)
.replicas(3)
.config(TopicConfig.RETENTION_MS_CONFIG, "1680000")
.build();
}
If you are not using Spring Boot, you'll additionally have to define the KafkaAdmin bean:
#Bean
public KafkaAdmin admin() {
Map<String, Object> configs = new HashMap<>();
configs.put(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG,"localhost:9092");
return new KafkaAdmin(configs);
}
If you want to edit the configuration of an existing topic, you'll have to use the AdminClient, here's the snippet to change the retention.ms at a topic level:
Map<String, Object> config = new HashMap<>();
config.put(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG,"localhost:9092");
AdminClient client = AdminClient.create(config);
ConfigResource resource = new ConfigResource(ConfigResource.Type.TOPIC, "new-topic");
// Update the retention.ms value
ConfigEntry retentionEntry = new ConfigEntry(TopicConfig.RETENTION_MS_CONFIG, "1680000");
Map<ConfigResource, Config> updateConfig = new HashMap<>();
updateConfig.put(resource, new Config(Collections.singleton(retentionEntry)));
AlterConfigOp op = new AlterConfigOp(retentionEntry, AlterConfigOp.OpType.SET);
Map<ConfigResource, Collection<AlterConfigOp>> configs = new HashMap<>(1);
configs.put(resource, Arrays.asList(op));
AlterConfigsResult alterConfigsResult = client.incrementalAlterConfigs(configs);
alterConfigsResult.all();
The configuration can be set up automatically using this #PostConstruct method that takes in NewTopic beans.
#Autowired
private Set<NewTopic> topics;
#PostConstruct
public void reconfigureTopics() throws ExecutionException, InterruptedException {
try (final AdminClient adminClient = AdminClient.create(Map.of(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG, kafkaBootstrapServers))) {
adminClient.incrementalAlterConfigs(topics.stream()
.filter(topic -> topic.configs() != null)
.collect(Collectors.toMap(
topic -> new ConfigResource(ConfigResource.Type.TOPIC, topic.name()),
topic -> topic.configs().entrySet()
.stream()
.map(e -> new ConfigEntry(e.getKey(), e.getValue()))
.peek(ce -> log.debug("configuring {} {} = {}", topic.name(), ce.name(), ce.value()))
.map(ce -> new AlterConfigOp(ce, AlterConfigOp.OpType.SET))
.collect(Collectors.toList())
)))
.all()
.get();
}
}
I guess you could use admin client (https://kafka.apache.org/22/javadoc/index.html?org/apache/kafka/clients/admin/AdminClient.html) for this. You can create Admin client instance in your application and use create or alter topic command for manipulating topic configurations, including retention.
To create a topic using AdminClient programmatically with the specified retention time, do the following:
NewTopic topic = new NewTopic(topicName, numPartitions, replicationFactor);
topic.configs(Map.of(TopicConfig.RETENTION_MS_CONFIG, retentionMs.toString()));
adminClient.createTopics(List.of(topic));
This is my first attempt at trying to use a KTable. I have a Kafka Stream that contains Avro serialized objects of type A,B. And this works fine. I can write a Consumer that consumes just fine or a simple KStream that simply counts records.
The B object has a field containing a country code. I'd like to supply that code to a KTable so it can count the number of records that contain a particular country code. To do so I'm trying to convert the stream into a stream of X,Y (or really: country-code, count). Eventually I look at the contents of the table and extract an array of KV pairs.
The code I have (included) always errors out with the following (see the line with 'Caused by'):
2018-07-26 13:42:48.688 [com.findology.tools.controller.TestEventGeneratorController-16d7cd06-4742-402e-a679-898b9ef78c41-StreamThread-1; AssignedStreamsTasks] ERROR -- stream-thread [com.findology.tools.controller.TestEventGeneratorController-16d7c\
d06-4742-402e-a679-898b9ef78c41-StreamThread-1] Failed to process stream task 0_0 due to the following error:
org.apache.kafka.streams.errors.StreamsException: Exception caught in process. taskId=0_0, processor=KSTREAM-SOURCE-0000000000, topic=com.findology.model.traffic.CpaTrackingCallback, partition=0, offset=962649
at org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:240)
at org.apache.kafka.streams.processor.internals.AssignedStreamsTasks.process(AssignedStreamsTasks.java:94)
at org.apache.kafka.streams.processor.internals.TaskManager.process(TaskManager.java:411)
at org.apache.kafka.streams.processor.internals.StreamThread.processAndMaybeCommit(StreamThread.java:922)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:802)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:749)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:719)
Caused by: org.apache.kafka.streams.errors.StreamsException: A serializer (key: org.apache.kafka.common.serialization.ByteArraySerializer / value: org.apache.kafka.common.serialization.ByteArraySerializer) is not compatible to the actual key or value type (key type: java.lang.Integer / value type: java.lang.Integer). Change the default Serdes in StreamConfig or provide correct Serdes via method parameters.
at org.apache.kafka.streams.processor.internals.SinkNode.process(SinkNode.java:92)
at org.apache.kafka.streams.processor.internals.AbstractProcessorContext.forward(AbstractProcessorContext.java:174)
at org.apache.kafka.streams.kstream.internals.KStreamFilter$KStreamFilterProcessor.process(KStreamFilter.java:43)
at org.apache.kafka.streams.processor.internals.ProcessorNode$1.run(ProcessorNode.java:46)
at org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetricsImpl.java:211)
at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:124)
at org.apache.kafka.streams.processor.internals.AbstractProcessorContext.forward(AbstractProcessorContext.java:174)
at org.apache.kafka.streams.kstream.internals.KStreamTransform$KStreamTransformProcessor.process(KStreamTransform.java:59)
at org.apache.kafka.streams.processor.internals.ProcessorNode$1.run(ProcessorNode.java:46)
at org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetricsImpl.java:211)
at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:124)
at org.apache.kafka.streams.processor.internals.AbstractProcessorContext.forward(AbstractProcessorContext.java:174)
at org.apache.kafka.streams.processor.internals.SourceNode.process(SourceNode.java:80)
at org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:224)
... 6 more
Caused by: java.lang.ClassCastException: java.lang.Integer cannot be cast to [B
at org.apache.kafka.common.serialization.ByteArraySerializer.serialize(ByteArraySerializer.java:21)
at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.send(RecordCollectorImpl.java:146)
at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.send(RecordCollectorImpl.java:94)
at org.apache.kafka.streams.processor.internals.SinkNode.process(SinkNode.java:87)
... 19 more
And here is the code I'm using. I've omitted certain classes for brevity. Note that I'm not using the Confluent KafkaAvro classes.
private synchronized void createStreamProcessor2() {
if (streams == null) {
try {
Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, getClass().getName());
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
StreamsConfig config = new StreamsConfig(props);
StreamsBuilder builder = new StreamsBuilder();
Map<String, Object> serdeProps = new HashMap<>();
serdeProps.put("schema.registry.url", schemaRegistryURL);
AvroSerde<CpaTrackingCallback> cpaTrackingCallbackAvroSerde = new AvroSerde<>(schemaRegistryURL);
cpaTrackingCallbackAvroSerde.configure(serdeProps, false);
// This is the key to telling kafka the specific Serde instance to use
// to deserialize the Avro encoded value
KStream<Long, CpaTrackingCallback> stream = builder.stream(CpaTrackingCallback.class.getName(),
Consumed.with(Serdes.Long(), cpaTrackingCallbackAvroSerde));
// provide a way to convert CpsTrackicking... info into just country codes
// (Long, CpaTrackingCallback) -> (countryCode:Integer, placeHolder:Long)
TransformerSupplier<Long, CpaTrackingCallback, KeyValue<Integer, Long>> transformer = new TransformerSupplier<Long, CpaTrackingCallback, KeyValue<Integer, Long>>() {
#Override
public Transformer<Long, CpaTrackingCallback, KeyValue<Integer, Long>> get() {
return new Transformer<Long, CpaTrackingCallback, KeyValue<Integer, Long>>() {
#Override
public void init(ProcessorContext context) {
// Not doing Punctuate so no need to store context
}
#Override
public KeyValue<Integer, Long> transform(Long key, CpaTrackingCallback value) {
return new KeyValue(value.getCountryCode(), 1);
}
#Override
public KeyValue<Integer, Long> punctuate(long timestamp) {
return null;
}
#Override
public void close() {
}
};
}
};
KTable<Integer, Long> countryCounts = stream.transform(transformer).groupByKey() //
.count(Materialized.as("country-counts"));
streams = new KafkaStreams(builder.build(), config);
Runtime.getRuntime().addShutdownHook(new Thread(streams::close));
streams.cleanUp();
streams.start();
try {
countryCountsView = waitUntilStoreIsQueryable("country-counts", QueryableStoreTypes.keyValueStore(),
streams);
}
catch (InterruptedException e) {
log.warn("Interrupted while waiting for query store to become available", e);
}
}
catch (Exception e) {
log.error(e);
}
}
}
The bare groupByKey() method on KStream uses the default serializer/deserializer (which you haven't set). Use the method groupByKey(Serialized<K,V> serialized), as in:
.groupByKey(Serialized.with(Serdes.Integer(), Serdes.Long()))
Also note, what you do in your custom TransformerSupplier, you can do simply with a KStream.map call.
I have a project where I am consuming data from Kafka. Apparently, there are a couple fields that are going to be included in the headers that I will need to read as well for each message. Is there a way to do this in Flink currently?
Thanks!
#Jicaar, Actually Kafka has added Header notion since version 0.11.0.0. https://issues.apache.org/jira/browse/KAFKA-4208
The problem is flink-connector-kafka-0.11_2.11 which comes with flink-1.4.0, and supposedly supports kafka-0.11.0.0 just ignores message headers when reading from kafka.
So unfortunately there is no way to read those headers unless you implement your own KafkaConsumer in flin.
I'm also interested in readin in kafka message headers and hope Flink team will add support for this.
I faced similar issue and found a way to do this in Flink 1.8. Here is what I wrote:
FlinkKafkaConsumer<ObjectNode> consumer = new FlinkKafkaConsumer("topic", new JSONKeyValueDeserializationSchema(true){
ObjectMapper mapper = new ObjectMapper();
#Override
public ObjectNode deserialize(ConsumerRecord<byte[], byte[]> record) throws Exception {
ObjectNode result = super.deserialize(record);
if (record.headers() != null) {
Map<String, JsonNode> headers = StreamSupport.stream(record.headers().spliterator(), false).collect(Collectors.toMap(h -> h.key(), h -> (JsonNode)this.mapper.convertValue(new String(h.value()), JsonNode.class)));
result.set("headers", mapper.convertValue(headers, JsonNode.class));
}
return result;
}
}, kafkaProps);
Hope this helps!
Here's the code for new versions of Flink.
KafkaSource<String> source = KafkaSource.<String>builder()
.setBootstrapServers(ParameterConfig.parameters.getRequired(ParameterConstant.KAFKA_ADDRESS))
.setTopics(ParameterConfig.parameters.getRequired(ParameterConstant.KAFKA_SOURCE_TOPICS))
.setGroupId(ParameterConfig.parameters.getRequired(ParameterConstant.KAFKA_SOURCE_GROUPID))
.setStartingOffsets(OffsetsInitializer.latest())
.setDeserializer(new KafkaRecordDeserializationSchema<String>() {
#Override
public void deserialize(ConsumerRecord<byte[], byte[]> consumerRecord, Collector<String> collector) {
try {
Map<String, String> headers = StreamSupport
.stream(consumerRecord.headers().spliterator(), false)
.collect(Collectors.toMap(Header::key, h -> new String(h.value())));
collector.collect(new JSONObject(headers).toString());
} catch (Exception e){
e.printStackTrace();
log.error("Headers Not found in Kafka Stream with consumer record : {}", consumerRecord);
}
}
#Override
public TypeInformation<String> getProducedType() {
return TypeInformation.of(new TypeHint<>() {});
}
})
.build();