I need to configure retention policy of a particular topic during creation. I tried to look for solution i could only find command level alter command as below
./bin/kafka-topics.sh --zookeeper localhost:2181 --alter --topic my-topic --config retention.ms=1680000
Can someone let me know a way to configure it during creation, something like xml or properties file configuration in spring-mvc.
Spring Kafka lets you create new topics by declaring #Beans in your application context. This will require a bean of type KafkaAdmin in the application context, which will be created automatically if using Spring Boot. You could define your topic as follows:
#Bean
public NewTopic myTopic() {
return TopicBuilder.name("my-topic")
.partitions(4)
.replicas(3)
.config(TopicConfig.RETENTION_MS_CONFIG, "1680000")
.build();
}
If you are not using Spring Boot, you'll additionally have to define the KafkaAdmin bean:
#Bean
public KafkaAdmin admin() {
Map<String, Object> configs = new HashMap<>();
configs.put(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG,"localhost:9092");
return new KafkaAdmin(configs);
}
If you want to edit the configuration of an existing topic, you'll have to use the AdminClient, here's the snippet to change the retention.ms at a topic level:
Map<String, Object> config = new HashMap<>();
config.put(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG,"localhost:9092");
AdminClient client = AdminClient.create(config);
ConfigResource resource = new ConfigResource(ConfigResource.Type.TOPIC, "new-topic");
// Update the retention.ms value
ConfigEntry retentionEntry = new ConfigEntry(TopicConfig.RETENTION_MS_CONFIG, "1680000");
Map<ConfigResource, Config> updateConfig = new HashMap<>();
updateConfig.put(resource, new Config(Collections.singleton(retentionEntry)));
AlterConfigOp op = new AlterConfigOp(retentionEntry, AlterConfigOp.OpType.SET);
Map<ConfigResource, Collection<AlterConfigOp>> configs = new HashMap<>(1);
configs.put(resource, Arrays.asList(op));
AlterConfigsResult alterConfigsResult = client.incrementalAlterConfigs(configs);
alterConfigsResult.all();
The configuration can be set up automatically using this #PostConstruct method that takes in NewTopic beans.
#Autowired
private Set<NewTopic> topics;
#PostConstruct
public void reconfigureTopics() throws ExecutionException, InterruptedException {
try (final AdminClient adminClient = AdminClient.create(Map.of(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG, kafkaBootstrapServers))) {
adminClient.incrementalAlterConfigs(topics.stream()
.filter(topic -> topic.configs() != null)
.collect(Collectors.toMap(
topic -> new ConfigResource(ConfigResource.Type.TOPIC, topic.name()),
topic -> topic.configs().entrySet()
.stream()
.map(e -> new ConfigEntry(e.getKey(), e.getValue()))
.peek(ce -> log.debug("configuring {} {} = {}", topic.name(), ce.name(), ce.value()))
.map(ce -> new AlterConfigOp(ce, AlterConfigOp.OpType.SET))
.collect(Collectors.toList())
)))
.all()
.get();
}
}
I guess you could use admin client (https://kafka.apache.org/22/javadoc/index.html?org/apache/kafka/clients/admin/AdminClient.html) for this. You can create Admin client instance in your application and use create or alter topic command for manipulating topic configurations, including retention.
To create a topic using AdminClient programmatically with the specified retention time, do the following:
NewTopic topic = new NewTopic(topicName, numPartitions, replicationFactor);
topic.configs(Map.of(TopicConfig.RETENTION_MS_CONFIG, retentionMs.toString()));
adminClient.createTopics(List.of(topic));
Related
I have an implementation related to KTable and using CloudEvents to produce events, but for some unknown reasons, the produced event from KTable is not formatted based on CloudEvent. The implementation is as below:
public void initKafkaStream() {
StreamsBuilder streamsBuilder = new StreamsBuilder();
PojoCloudEventDataMapper<TicketEvent> ticketEventMapper = PojoCloudEventDataMapper.from(objectMapper, TicketEvent.class);
KStream<String, CloudEvent> rawTicketStream = streamsBuilder.stream(rawTicketEvent, Consumed.with(Serdes.String(), cloudEventSerde));
rawTicketStream
.mapValues(e -> convertToPojo(e, TicketEventMapper))
.filter((k, v) -> v != null)
.groupByKey()
.aggregate(
AggregatedTicketEvent::new,
(key, val, agg) -> doAggregation(agg, val),
Materialized
.<String, AggregatedTicketEvent, KeyValueStore<Bytes, byte[]>>as("aggregatedTicket")
.withValueSerde(aggregatedTicketEventSerde)
.withLoggingDisabled()
)
.mapValues(result -> {
try {
return CloudEventBuilder.v1()
.withId(UUID.randomUUID().toString())
.withType("ticket_update")
.withSource(sourceTemplate.expand(result.getCurrent().getId()))
.withTime(result.getMeta().getOccurredAt())
.withData(objectMapper.writeValueAsBytes(result))
.withDataContentType("application/json")
.build();
} catch (JsonProcessingException e) {
throw new RuntimeException(e);
}
})
.toStream()
.to(aggregatedTicketEvent, Produced.with(Serdes.String(), cloudEventSerde));
streams = new KafkaStreams(streamsBuilder.build(streamsConfig), streamsConfig);
streams.setUncaughtExceptionHandler(ex -> StreamThreadExceptionResponse.REPLACE_THREAD);
streams.start();
}
Has anyone had such an issue?
Thanks in advance
The issue was that the configuration in props has been overwritten from Kafka streams in the serializer/deserializer, and by default sets the format to Encoding.BINARY. When the encoding is in Binary then the CloudEvents format is present only in the header instead of in the payload. To make sure that the serializers have the correct configuration I added them in the CloudEventSerializer and CloudEventDeserializer. In this case, the Serdes.serdeFrom() will be like the below:
Map<String, Object> ceSerializerConfigs = new HashMap<>();
ceSerializerConfigs.put(ENCODING_CONFIG, Encoding.STRUCTURED);
ceSerializerConfigs.put(EVENT_FORMAT_CONFIG, JsonFormat.CONTENT_TYPE);
CloudEventSerializer serializer = new CloudEventSerializer();
serializer.configure(ceSerializerConfigs, false);
CloudEventDeserializer deserializer = new CloudEventDeserializer();
deserializer.configure(ceSerializerConfigs, false);
this.cloudEventSerde = Serdes.serdeFrom(serializer, deserializer);
In order to get cloud events format in JSON payload we have to use Encoding.STRUCTURED with JSON content type, this will do the magic and have the results in the payload.
Hope this will help someone who's struggling with this issue!
Best,
I use kafka on windows, run zookeeper first through the console, then kafka. Everything starts perfectly. The producer runs fine as well. But as soon as I start the consumer, logs start pouring into the console and I get the map failed error. I tried to change the allocated memory in the kafka server start file.
At the moment my file kafka-server-start.sh it looks like this:
export KAFKA_HEAP_OPTS="-Xmx1G -Xms512M"
And if i delete KafkaListener everything starts up perfectly as well, but the interaction between the topics is important to me.
Kafka version: 2.13-3.2.1
Consumer property:
#Value("${spring.kafka.bootstrap-servers}")
private String bootstrapService;
public Map<String, Object> getDefaultConsumerConfig() {
Map<String, Object> props = new HashMap<>();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapService);
props.put(JsonDeserializer.TRUSTED_PACKAGES, "*");
return props;
}
Consumer config:
#Bean
public ConsumerFactory<String, ConfigurationEventDto> configurationEventDtoConsumerFactory() {
return new DefaultKafkaConsumerFactory<>(kafkaService.getDefaultConsumerConfig(),
new JsonDeserializer<>(),
new JsonDeserializer<>(ConfigurationEventDto.class, false));
}
#Bean
public ConcurrentKafkaListenerContainerFactory<String, ConfigurationEventDto> configurationEventDtoKafkaFactory(
ConsumerFactory<String, ConfigurationEventDto> configurationEventDtoConsumerFactory) {
ConcurrentKafkaListenerContainerFactory<String, ConfigurationEventDto> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(configurationEventDtoConsumerFactory);
return factory;
}
Kafka listener:
#KafkaListener(topics = "activity-record-configuration-event",
groupId = "activity-record-configuration-event",
containerFactory = "configurationEventDtoKafkaFactory")
void listen(ConfigurationEventDto configurationEventDto) {
log.info("new configurationEventDto received");
service.save(configurationEventDto);
}
And when i start my consumer microservice kafka logs are:
java.io.IOException: Map failed
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:938)
at kafka.log.AbstractIndex.<init>(AbstractIndex.scala:124)
at kafka.log.OffsetIndex.<init>(OffsetIndex.scala:54)
at kafka.log.LazyIndex$.$anonfun$forOffset$1(LazyIndex.scala:106)
at kafka.log.LazyIndex.$anonfun$get$1(LazyIndex.scala:63)
at kafka.log.LazyIndex.get(LazyIndex.scala:60)
at kafka.log.LogSegment.offsetIndex(LogSegment.scala:64)
at kafka.log.LogSegment.readNextOffset(LogSegment.scala:453)
at kafka.log.LogLoader.$anonfun$recoverLog$6(LogLoader.scala:457)
at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.scala:17)
at scala.Option.getOrElse(Option.scala:201)
at kafka.log.LogLoader.recoverLog(LogLoader.scala:457)
at kafka.log.LogLoader.load(LogLoader.scala:162)
at kafka.log.UnifiedLog$.apply(UnifiedLog.scala:1810)
at kafka.log.LogManager.$anonfun$getOrCreateLog$1(LogManager.scala:901)
at scala.Option.getOrElse(Option.scala:201)
at kafka.log.LogManager.getOrCreateLog(LogManager.scala:852)
at kafka.cluster.Partition.createLog(Partition.scala:372)
at kafka.cluster.Partition.maybeCreate$1(Partition.scala:347)
at kafka.cluster.Partition.createLogIfNotExists(Partition.scala:354)
at kafka.cluster.Partition.$anonfun$makeLeader$1(Partition.scala:566)
at kafka.cluster.Partition.makeLeader(Partition.scala:543)
at kafka.server.ReplicaManager.$anonfun$makeLeaders$5(ReplicaManager.scala:1592)
at kafka.utils.Implicits$MapExtensionMethods$.$anonfun$forKeyValue$1(Implicits.scala:62)
at scala.collection.mutable.HashMap$Node.foreachEntry(HashMap.scala:633)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:499)
at kafka.server.ReplicaManager.becomeLeaderOrFollower(ReplicaManager.scala:1590)
at kafka.server.KafkaApis.handleLeaderAndIsrRequest(KafkaApis.scala:269)
at kafka.server.KafkaApis.handle(KafkaApis.scala:176)
at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:75)
at java.lang.Thread.run(Thread.java:750)
And of course i cleaned the folders with logs!
I'm trying to figure out how i can test my Spring Cloud Streams Kafka-Streams application.
The application lookls like this:
Stream 1: Topic1 > Topic2
Stream 2: Topic2 + Topic3 joined > Topic4
Stream 3: Topic4 > Topic5
I tried different approaches like the TestChannelBinder but this approach only works with Simple functions not Streams and Avro.
I decided to use EmbeddedKafka with MockSchemaRegistryClient. I can produce to a topic and also consume from the same topic again (topic1) but i'm not able to consume from (topic2).
In my test application.yaml i put the following configuration (i'm only testing the first stream for now, i want to extend it once this works):
spring.application.name: processingapp
spring.cloud:
function.definition: stream1 # not now ;stream2;stream3
stream:
bindings:
stream1-in-0:
destination: topic1
stream1-out-0:
destination: topic2
kafka:
binder:
min-partition-count: 1
replication-factor: 1
auto-create-topics: true
auto-add-partitions: true
bindings:
default:
consumer:
autoRebalanceEnabled: true
resetOffsets: true
startOffset: earliest
stream1-in-0:
consumer:
keySerde: io.confluent.kafka.streams.serdes.avro.PrimitiveAvroSerde
valueSerde: io.confluent.kafka.streams.serdes.avro.SpecificAvroSerde
stream1-out-0:
producer:
keySerde: io.confluent.kafka.streams.serdes.avro.PrimitiveAvroSerde
valueSerde: io.confluent.kafka.streams.serdes.avro.SpecificAvroSerde
streams:
binder:
configuration:
schema.registry.url: mock://localtest
specivic.avro.reader: true
My test looks like the following:
#RunWith(SpringRunner.class)
#SpringBootTest
public class Test {
private static final String INPUT_TOPIC = "topic1";
private static final String OUTPUT_TOPIC = "topic2";
#ClassRule
public static EmbeddedKafkaRule embeddedKafka = new EmbeddedKafkaRule(1, true, 1, INPUT_TOPIC, OUTPUT_TOPIC);
#BeforeClass
public static void setup() {
System.setProperty("spring.cloud.stream.kafka.binder.brokers", embeddedKafka.getEmbeddedKafka().getBrokersAsString());
}
#org.junit.Test
public void testSendReceive() throws IOException {
Map<String, Object> senderProps = KafkaTestUtils.producerProps(embeddedKafka.getEmbeddedKafka());
senderProps.put("key.serializer", LongSerializer.class);
senderProps.put("value.serializer", SpecificAvroSerializer.class);
senderProps.put("schema.registry.url", "mock://localtest");
AvroFileParser fileParser = new AvroFileParser();
DefaultKafkaProducerFactory<Long, Test1> pf = new DefaultKafkaProducerFactory<>(senderProps);
KafkaTemplate<Long, Test1> template = new KafkaTemplate<>(pf, true);
Test1 test1 = fileParser.parseTest1("src/test/resources/mocks/test1.json");
template.send(INPUT_TOPIC, 123456L, test1);
System.out.println("produced");
Map<String, Object> consumer1Props = KafkaTestUtils.consumerProps("testConsumer1", "false", embeddedKafka.getEmbeddedKafka());
consumer1Props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
consumer1Props.put("key.deserializer", LongDeserializer.class);
consumer1Props.put("value.deserializer", SpecificAvroDeserializer.class);
consumer1Props.put("schema.registry.url", "mock://localtest");
DefaultKafkaConsumerFactory<Long, Test1> cf = new DefaultKafkaConsumerFactory<>(consumer1Props);
Consumer<Long, Test1> consumer1 = cf.createConsumer();
consumer1.subscribe(Collections.singleton(INPUT_TOPIC));
ConsumerRecords<Long, Test1> records = consumer1.poll(Duration.ofSeconds(10));
consumer1.commitSync();
System.out.println("records count?");
System.out.println("" + records.count());
Test1 fetchedTest1;
fetchedTest1 = records.iterator().next().value();
assertThat(records.count()).isEqualTo(1);
System.out.println("found record");
System.out.println(fetchedTest1.toString());
Map<String, Object> consumer2Props = KafkaTestUtils.consumerProps("testConsumer2", "false", embeddedKafka.getEmbeddedKafka());
consumer2Props.put("key.deserializer", StringDeserializer.class);
consumer2Props.put("value.deserializer", TestAvroDeserializer.class);
consumer2Props.put("schema.registry.url", "mock://localtest");
DefaultKafkaConsumerFactory<String, Test2> consumer2Factory = new DefaultKafkaConsumerFactory<>(consumer2Props);
Consumer<String, Test2> consumer2 = consumer2Factory.createConsumer();
consumer2.subscribe(Collections.singleton(OUTPUT_TOPIC));
ConsumerRecords<String, Test2> records2 = consumer2.poll(Duration.ofSeconds(30));
consumer2.commitSync();
if (records2.iterator().hasNext()) {
System.out.println("has next");
} else {
System.out.println("has no next");
}
}
}
I receive the following exception when trying to consume and deserialize from topic2:
Caused by: org.apache.kafka.common.errors.SerializationException: Error retrieving Avro unknown schema for id 0
Caused by: java.io.IOException: Cannot get schema from schema registry!
at io.confluent.kafka.schemaregistry.client.MockSchemaRegistryClient.getSchemaBySubjectAndIdFromRegistry(MockSchemaRegistryClient.java:193) ~[kafka-schema-registry-client-6.2.0.jar:na]
at io.confluent.kafka.schemaregistry.client.MockSchemaRegistryClient.getSchemaBySubjectAndId(MockSchemaRegistryClient.java:249) ~[kafka-schema-registry-client-6.2.0.jar:na]
at io.confluent.kafka.schemaregistry.client.MockSchemaRegistryClient.getSchemaById(MockSchemaRegistryClient.java:232) ~[kafka-schema-registry-client-6.2.0.jar:na]
at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer$DeserializationContext.schemaFromRegistry(AbstractKafkaAvroDeserializer.java:307) ~[kafka-avro-serializer-6.2.0.jar:na]
at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserialize(AbstractKafkaAvroDeserializer.java:107) ~[kafka-avro-serializer-6.2.0.jar:na]
at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserialize(AbstractKafkaAvroDeserializer.java:86) ~[kafka-avro-serializer-6.2.0.jar:na]
at io.confluent.kafka.serializers.KafkaAvroDeserializer.deserialize(KafkaAvroDeserializer.java:55) ~[kafka-avro-serializer-6.2.0.jar:na]
at org.apache.kafka.common.serialization.Deserializer.deserialize(Deserializer.java:60) ~[kafka-clients-2.7.1.jar:na]
at org.apache.kafka.streams.processor.internals.SourceNode.deserializeKey(SourceNode.java:54) ~[kafka-streams-2.7.1.jar:na]
at org.apache.kafka.streams.processor.internals.RecordDeserializer.deserialize(RecordDeserializer.java:65) ~[kafka-streams-2.7.1.jar:na]
at org.apache.kafka.streams.processor.internals.RecordQueue.updateHead(RecordQueue.java:176) ~[kafka-streams-2.7.1.jar:na]
at org.apache.kafka.streams.processor.internals.RecordQueue.addRawRecords(RecordQueue.java:112) ~[kafka-streams-2.7.1.jar:na]
at org.apache.kafka.streams.processor.internals.PartitionGroup.addRawRecords(PartitionGroup.java:185) ~[kafka-streams-2.7.1.jar:na]
at org.apache.kafka.streams.processor.internals.StreamTask.addRecords(StreamTask.java:895) ~[kafka-streams-2.7.1.jar:na]
at org.apache.kafka.streams.processor.internals.TaskManager.addRecordsToTasks(TaskManager.java:1008) ~[kafka-streams-2.7.1.jar:na]
at org.apache.kafka.streams.processor.internals.StreamThread.pollPhase(StreamThread.java:812) ~[kafka-streams-2.7.1.jar:na]
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:625) ~[kafka-streams-2.7.1.jar:na]
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:564) ~[kafka-streams-2.7.1.jar:na]
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:523) ~[kafka-streams-2.7.1.jar:na]
There won't be a message consumed.
So i tried to overwrite the SpecificAvroSerde and register the schemas directly and use this deserializer.
public class TestAvroDeserializer<T extends org.apache.avro.specific.SpecificRecord>
extends SpecificAvroDeserializer<T> implements Deserializer<T> {
private final KafkaAvroDeserializer inner;
public TestAvroDeserializer() throws IOException, RestClientException {
MockSchemaRegistryClient mockedClient = new MockSchemaRegistryClient();
Schema.Parser parser = new Schema.Parser();
Schema test2Schema = parser.parse(new File("./src/main/resources/avro/test2.avsc"));
mockedClient.register("test2-value", test2Schema , 1, 0);
inner = new KafkaAvroDeserializer(mockedClient);
}
/**
* For testing purposes only.
*/
TestAvroDeserializer(final SchemaRegistryClient client) throws IOException, RestClientException {
MockSchemaRegistryClient mockedClient = new MockSchemaRegistryClient();
Schema.Parser parser = new Schema.Parser();
Schema test2Schema = parser.parse(new File("./src/main/resources/avro/test2.avsc"));
mockedClient.register("test2-value", test2Schema , 1, 0);
inner = new KafkaAvroDeserializer(mockedClient);
}
}
With this deserializer it won't work too. Does anyone have experience on how to do this tests with EmbeddedKafka and MockSchemaRegistry? Or is there another approach i should use?
I'm very glad if someone can help. Thank you in advance.
I found an appropriate way of integration testing my topology.
I use the TopologyTestDriver from the kafka-streams-test-utils package.
Include this dependency to Maven:
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-streams-test-utils</artifactId>
<scope>test</scope>
</dependency>
For the application described in the question setting up the TopologyTestDriver would look like the following. This code is just sequentially to show how it works.
#Test
void test() {
keySerde.configure(Map.of(KafkaAvroSerializerConfig.SCHEMA_REGISTRY_URL_CONFIG, "mock://schemas"), true);
valueSerdeTopic1.configure(Map.of(KafkaAvroSerializerConfig.SCHEMA_REGISTRY_URL_CONFIG, "mock://schemas"), false);
valueSerdeTopic2.configure(Map.of(KafkaAvroSerializerConfig.SCHEMA_REGISTRY_URL_CONFIG, "mock://schemas"), false);
final StreamsBuilder builder = new StreamsBuilder();
Configuration config = new Configuration(); // class where you declare your spring cloud stream functions
KStream<String, Topic1> input = builder.stream("topic1", Consumed.with(keySerde, valueSerdeTopic1));
KStream<String, Topic2> output = config.stream1().apply(input);
output.to("topic2");
Topology topology = builder.build();
Properties streamsConfig = new Properties();
streamsConfig.putAll(Map.of(
org.apache.kafka.streams.StreamsConfig.APPLICATION_ID_CONFIG, "toplogy-test-driver",
org.apache.kafka.streams.StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "ignored",
KafkaAvroSerializerConfig.SCHEMA_REGISTRY_URL_CONFIG, "mock://schemas",
org.apache.kafka.streams.StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, PrimitiveAvroSerde.class.getName(),
org.apache.kafka.streams.StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, SpecificAvroSerde.class.getName()
));
TopologyTestDriver testDriver = new TopologyTestDriver(topology, streamsConfig);
TestInputTopic<String, Topic1> inputTopic = testDriver.createInputTopic("topic1", keySerde.serializer(), valueSerdeTopic1.serializer());
TestOutputTopic<String, Topic2> outputTopic = testDriver.createOutputTopic("topic2", keySerde.deserializer(), valueSerdeTopic2.deserializer());
inputTopic.pipeInput("key", topic1AvroModel); // Write to the input topic which applies the topology processor of your spring-cloud-stream app
KeyValue<String, Topic2> outputRecord = outputTopic.readKeyValue(); // Read from the output topic
}
If you write more tests i recommend to abstract the setup code to not repeat yourself for each test.
I highly suggest this example from the spring-cloud-streams-samples repository, it leaded me to the solution to use TopologyTestDriver.
I do have a Song.Class (song_id, song_name, song_duration).
....
package org.example.demo.model;
public class Flux {
private int user_id;
private int song_id;
private float listening_duration;
public Flux(int user_id, int song_id, float listening_duration) {
this.user_id = user_id;
this.song_id = song_id;
this.listening_duration = listening_duration;
}
...
I do have a first program to send some events to a kafka topic (serialize in Avro) :
props.put("key.serializer", StringSerializer.class.getName());
props.put("value.serializer", KafkaAvroSerializer.class.getName());
props.put("schema.registry.url","http://172.17.0.8:8081");
KafkaProducer<String, Flux> kafkaProducerFlux = new KafkaProducer<String, Flux>(props);
Witin a loop:
Flux flux = Flux.newBuilder()
.setSongId(arraysong.get(s).getSongId())
.setUserId(arrayuser.get(u).getUserId())
.setListeningDuration(arraysong.get(s).getSongDuration())
.build();
kafkaProducerFlux.send(new ProducerRecord<String, Flux>(topic_events, flux));
....
Now, I would like to get back my object in a stream as :
KStream<String, Flux> source = builder.stream("music_flux");
KStream<String, GenericRecord> source = builder.stream("music_flux");
To bet able to join the user_id with inputs from another streams.
Thanks for your help.
You have to configure the key and value deserializer in the kafka stream properties. Since the Flux schema is written in schema registry via your Kafka producer, Kstream application will read the schema from there. You need to configure "SCHEMA_REGISTRY_URL_CONFIG" and "VALUE_SERDE_CLASS_CONFIG" as given below:
KStreamBuilder builder = new KStreamBuilder();
Properties props= new Properties();
// Set a few key parameters
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "kafka-app-1");
props.put(StreamsConfig.KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
props.put(StreamsConfig.VALUE_SERDE_CLASS_CONFIG, SpecificAvroSerde.class);
props.put(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, "http://172.17.0.8:8081");
StreamsConfig config = new StreamsConfig(props);
KStream<String, Flux> source = builder.stream("music_flux");
I have a project where I am consuming data from Kafka. Apparently, there are a couple fields that are going to be included in the headers that I will need to read as well for each message. Is there a way to do this in Flink currently?
Thanks!
#Jicaar, Actually Kafka has added Header notion since version 0.11.0.0. https://issues.apache.org/jira/browse/KAFKA-4208
The problem is flink-connector-kafka-0.11_2.11 which comes with flink-1.4.0, and supposedly supports kafka-0.11.0.0 just ignores message headers when reading from kafka.
So unfortunately there is no way to read those headers unless you implement your own KafkaConsumer in flin.
I'm also interested in readin in kafka message headers and hope Flink team will add support for this.
I faced similar issue and found a way to do this in Flink 1.8. Here is what I wrote:
FlinkKafkaConsumer<ObjectNode> consumer = new FlinkKafkaConsumer("topic", new JSONKeyValueDeserializationSchema(true){
ObjectMapper mapper = new ObjectMapper();
#Override
public ObjectNode deserialize(ConsumerRecord<byte[], byte[]> record) throws Exception {
ObjectNode result = super.deserialize(record);
if (record.headers() != null) {
Map<String, JsonNode> headers = StreamSupport.stream(record.headers().spliterator(), false).collect(Collectors.toMap(h -> h.key(), h -> (JsonNode)this.mapper.convertValue(new String(h.value()), JsonNode.class)));
result.set("headers", mapper.convertValue(headers, JsonNode.class));
}
return result;
}
}, kafkaProps);
Hope this helps!
Here's the code for new versions of Flink.
KafkaSource<String> source = KafkaSource.<String>builder()
.setBootstrapServers(ParameterConfig.parameters.getRequired(ParameterConstant.KAFKA_ADDRESS))
.setTopics(ParameterConfig.parameters.getRequired(ParameterConstant.KAFKA_SOURCE_TOPICS))
.setGroupId(ParameterConfig.parameters.getRequired(ParameterConstant.KAFKA_SOURCE_GROUPID))
.setStartingOffsets(OffsetsInitializer.latest())
.setDeserializer(new KafkaRecordDeserializationSchema<String>() {
#Override
public void deserialize(ConsumerRecord<byte[], byte[]> consumerRecord, Collector<String> collector) {
try {
Map<String, String> headers = StreamSupport
.stream(consumerRecord.headers().spliterator(), false)
.collect(Collectors.toMap(Header::key, h -> new String(h.value())));
collector.collect(new JSONObject(headers).toString());
} catch (Exception e){
e.printStackTrace();
log.error("Headers Not found in Kafka Stream with consumer record : {}", consumerRecord);
}
}
#Override
public TypeInformation<String> getProducedType() {
return TypeInformation.of(new TypeHint<>() {});
}
})
.build();