KTable with list of values - java

I would like to create a KTable<String, List>. When app receive a record Record<String, TimePeriod> it just add this timePeriod to corresponding list if key exists in other case creates new list with this timePeriod.
KTable<String, List<TimePeriod>> kTable = streamsBuilder(topologyConfig.exemptionsTopic(), Consumed.with(Serdes.String(), new TimePeriodSerde())
.groupByKey()
.aggregate(
ArrayList::new, /*Initializer*/
(key, timePeriod, timePeriodList) -> { /* aggregator */
timePeriodList.add(timePeriod);
return timePeriodList;
}, Materialized.<String,List<TimePeriod>, KeyValueStore<Bytes, byte[]>>as("storeName")
.withValueSerde(Serdes.ListSerde(ArrayList.class, new TimePeriodSerde())));
I'm not pretty sure that this is proper code)
And later I would like to join stream <String, Product> with this kTable. Assume that key in kStream and in kTable is productName.
I would like to join product with time period if product.getProductionDate() lie in any of this timePeriods. So something like this...
ValueJoiner<Product, List<TimePeriod>, JoinedProduct> joiner = (product, periodList) -> {
for(TimePeriod period : periodList) {
if(product.isInRange(period) {
return new JoinedProduct(product, period);
}
}
return new JoinedProduct(product, null);
};
streamsBuilder.stream(topologyConfig.productsTopic(), Consumed.with(Serdes.String(), new ProductSerde()))
.leftJoin(kTable, joiner)
.to(...);
when I try to execute this there is an error during building the project.
but intellij idea nothing underlines me.
I am lean more that there is mistake in creating a kTable

Related

Apache Flink join different DataStreams on specific key

I have two DataStreams, the first one called DataStream<String> source which receive records from a message broker, and the second one is a SingleOutputOperator<Event> events, which is the result of mapping the source into Event.class.
I have a uses cases that needs to use SingleOutputOperator<Event> events and other that uses DataStream<String> source. In one of the use cases that use DataStream<String> source, I need to join the SingleOutputOperator<String> result after apply some filters and to avoid to map the source again into Event.class as I already have that operation done and that Stream, I need to search each record into the SingleOutputOperator<String> result into the SingleOutputOperator<Event> events and the apply another map to export a SingleOutputOperator<EventOutDto> out.
This is the idea as example:
DataStream<String> source = env.readFrom(source);
SingleOutputOperator<Event> events = source.map(s -> mapper.readValue(s, Event.class));
public void filterAndJoin(DataStream<String> source, SingleOutputOperator<Event> events){
SingleOutputOperator<String> filtered = source.filter(s -> new FilterFunction());
SingleOutputOperator<EventOutDto> result = (this will be the result of search each record
based on id in the filtered stream into the events stream where the id must match and return the event if found)
.map(event -> new EventOutDto(event)).addSink(new RichSinkFunction());
}
I have this code:
filtered.join(events)
.where(k -> {
JsonNode tree = mapper.readTree(k);
String id = "";
if (tree.get("Id") != null) {
id = tree.get("Id").asText();
}
return id;
})
.equalTo(e -> {
return e.Id;
})
.window(TumblingEventTimeWindows.of(Time.seconds(1)))
.apply(new JoinFunction<String, Event, BehSingleEventTriggerDTO>() {
#Override
public EventOutDto join(String s, Event event) throws Exception {
return new EventOutDto(event);
}
})
.addSink(new SinkFunction());
In the above code all works fine, the ids are the same, so basically the where(id).equalTo(id) should work, but the process never reaches the apply function.
Observation: Watermark are assigned with the same timestamp
Questions:
Any idea why?
Am I explained myself fine?
I solved the join by doing this:
SingleOutputStreamOperator<ObjectDTO> triggers = candidates
.keyBy(new KeySelector())
.intervalJoin(keyedStream.keyBy(e -> e.Id))
.between(Time.milliseconds(-2), Time.milliseconds(1))
.process(new new ProcessFunctionOne())
.keyBy(k -> k.otherId)
.process(new ProcessFunctionTwo());

Java - Spring Boot - Reactive Redis Stream ( TEXT_EVENT_STREAM_VALUE )

I want to write an endpoint which always shows the newest messages of a redis stream (reactive).
The entities look like this {'key' : 'some_key', 'status' : 'some_string'}.
So I would like to have the following result:
Page is called, content would be for instance displaying an entity:
{'key' : 'abc', 'status' : 'status_A'}
the page is not closed
Then a new entity is added to the stream
XADD mystream * key abc status statusB
Now I would prefer to see each item of the Stream, without updating the Tab
{'key' : 'abc', 'status' : 'status_A'}
{'key' : 'abc', 'status' : 'status_B'}
When I try to mock this behavior it works and I get the expected output.
#GetMapping(value="/light/live/mock", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
#ResponseBody
public Flux<Light> liveLightMock() {
List<Light> test = Arrays.asList(new Light("key", "on") , new Light("key", "off"),
new Light("key", "on") , new Light("key", "off"),
new Light("key", "on") , new Light("key", "off"),
new Light("key", "on") , new Light("key", "off"),
new Light("key", "on") , new Light("key", "off"));
return Flux.fromIterable(test).delayElements(Duration.ofMillis(500));
}
The individual elements of the list are displayed one after another with a 500ms Delay between items.
However, when I try to access Redis instead of the mocked variant, it no longer works. I try to test the partial functions successively. So that my idea works first the save (1) function must work, if the save function works, displaying old records without reactiv features must work (2) and last but not least if both work i kinda need to get the reactiv part going.
Maybe you guys can help me get the Reactive Part Working. Im working on it for days without getting any improvements.
Ty guys :)
Test 1) - Saving Function (Short Version)
looks like its working.
#GetMapping(value="/light/create", produces = MediaType.APPLICATION_JSON_VALUE)
#ResponseBody
public Flux<Light> createTestLight() {
String status = (++statusIdx % 2 == 0) ? "on" : "off";
Light light = new Light(Consts.LIGHT_ID, status);
return LightRepository.save(light).flux();
}
#Override
public Mono<Light> save(Light light) {
Map<String, String> lightMap = new HashMap<>();
lightMap.put("key", light.getKey());
lightMap.put("status", light.getStatus());
return operations.opsForStream(redisSerializationContext)
.add("mystream", lightMap)
.map(__ -> light);
}
Test 2) - Loading/Reading Function (Short Version)
seems to be working, but not reaktiv -> i add a new entity while a WebView was Open, the View showed all Items but didnt Updated once i added new items. after reloading i saw every item
How can i get getLightsto return something that is working with TEXT_EVENT_STREAM_VALUE which subscribes to the stream?
#Override
public Flux<Object> getLights() {
ReadOffset readOffset = ReadOffset.from("0");
StreamOffset<String> offset = StreamOffset.fromStart("mystream"); //fromStart or Latest
Function<? super MapRecord<String, Object, Object>, ? extends Publisher<?>> mapFunc = entries -> {
Map<Object, Object> kvp = entries.getValue();
String key = (String) kvp.get("key");
String status = (String) kvp.get("status");
Light light = new Light(key, status);
return Flux.just(light);
};
return operations.opsForStream()
.read(offset)
.flatMap(mapFunc);
}
#GetMapping(value="/light/live", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
#ResponseBody
public Flux<Object> lightLive() {
return LightRepository.getLights();
}
Test 1) - Saving Function (Long Version)
The Endpoint & Saving Functions are part of Diffrent Classes.
String status = (++statusIdx % 2 == 0) ? "on" : "off"; flip flops the status from on to off, to on, to off, ...
#GetMapping(value="/light/create", produces = MediaType.APPLICATION_JSON_VALUE)
#ResponseBody
public Flux<Light> createTestLight() {
String status = (++statusIdx % 2 == 0) ? "on" : "off";
Light light = new Light(Consts.LIGHT_ID, status);
return LightRepository.save(light).flux();
}
#Override
public Mono<Light> save(Light light) {
Map<String, String> lightMap = new HashMap<>();
lightMap.put("key", light.getKey());
lightMap.put("status", light.getStatus());
return operations.opsForStream(redisSerializationContext)
.add("mystream", lightMap)
.map(__ -> light);
}
To Validate the Functions i
Delted the Stream, to Empty it
127.0.0.1:6379> del mystream
(integer) 1
127.0.0.1:6379> XLEN myStream
(integer) 0
Called the Creation Endpoint twice /light/create
i expected the Stream now to have two Items, on with status = on, and one with off
127.0.0.1:6379> XLEN mystream
(integer) 2
127.0.0.1:6379> xread STREAMS mystream 0-0
1) 1) "mystream"
2) 1) 1) "1610456865517-0"
2) 1) "key"
2) "light_1"
3) "status"
4) "off"
2) 1) "1610456866708-0"
2) 1) "key"
2) "light_1"
3) "status"
4) "on"
It looks like the Saving part is Working.
Test 2) - Loading/Reading Function (Long Version)
seems to be working, but not reaktiv -> i add a new entity and the page updates its values
#Override
public Flux<Object> getLights() {
ReadOffset readOffset = ReadOffset.from("0");
StreamOffset<String> offset = StreamOffset.fromStart("mystream"); //fromStart or Latest
Function<? super MapRecord<String, Object, Object>, ? extends Publisher<?>> mapFunc = entries -> {
Map<Object, Object> kvp = entries.getValue();
String key = (String) kvp.get("key");
String status = (String) kvp.get("status");
Light light = new Light(key, status);
return Flux.just(light);
};
return operations.opsForStream()
.read(offset)
.flatMap(mapFunc);
}
#GetMapping(value="/light/live", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
#ResponseBody
public Flux<Object> lightLive() {
return LightRepository.getLights();
}
Calling /light/live -> i should have N entries
-> if I can see Entries, the normal Display is Working (non Reactive)
Calling /light/create twice -> the live Few should have added 2 Entries -> N+2Entries
Waiting 1 Minute just to be Safe
The View Should Show N+2 Entries for the Reactiv Part to be working
Refresh View from 1 (/light/live), should still show the same amount if Reactiv Works
Displaying the Information works (1), the Adding part of (2) worked, checked per Terminal, 4) didnt work
ergo the Display is working, but its not reactive
after i refreshed the Browser (5) i got the expected N+2 entries - so (2) worked aswell
There's a misconception here, reading from Redis reactively does not mean you have subscribed for new events.
Reactive will not provide you live updates, it will call Redis once and it will display whatever is there. So even if you wait for a day or two nothing is going to change in UI/Console, you will still seeing N entries.
You need to either use Redis PUB/SUB or you need to call Redis repetitively to get the latest update.
EDIT:
A working solution..
private List<Light> reactiveReadToList() {
log.info("reactiveReadToList");
return read().collectList().block();
}
private Flux<Light> read() {
StreamOffset<Object> offset = StreamOffset.fromStart("mystream");
return redisTemplate
.opsForStream()
.read(offset)
.flatMap(
e -> {
Map<Object, Object> kvp = e.getValue();
String key = (String) kvp.get("key");
String id = (String) kvp.get("id");
String status = (String) kvp.get("status");
Light light = new Light(id, key, status);
log.info("{}", light);
return Flux.just(light);
});
}
A reader that reads data from Redis on demand using reactive template and send it to the client as it sees using offset, it sends only one event at once we can send all of them.
#RequiredArgsConstructor
class DataReader {
#NonNull FluxSink<Light> sink;
private List<Light> readLights = null;
private int currentOffset = 0;
void register() {
readLights = reactiveReadToList();
sink.onRequest(
e -> {
long demand = sink.requestedFromDownstream();
for (int i = 0; i < demand && currentOffset < readLights.size(); i++, currentOffset++) {
sink.next(readLights.get(currentOffset));
}
if (currentOffset == readLights.size()) {
readLights = reactiveReadToList();
currentOffset = 0;
}
});
}
}
A method that uses DataReader to generate flux
public Flux<Light> getLights() {
return Flux.create(e -> new DataReader(e).register());
}
Now we've added an onRequest method on the sink to handle the client demand, this reads data from the Redis stream as required and sends it to the client.
This looks to be very CPU intensive maybe we should delay the calls if there're no more new events, maybe add a sleep call inside register method if we see there're not new elements in the stream.

Error while creating KTable with custom key

Use-case - There is a topic with messages (null, Metadata). I need to create a Ktable from the topic with the key (metadata.entity_id) and value as metatdata. This table will be later used to do a join with a stream with the same key.
private final static String KAFKA_BROKERS = "localhost:9092";
private final static String APPLICATION_ID = "TestMetadataTable";
private final static String AUTO_OFFSET_RESET_CONFIG = "earliest";
private final static String METADATA_TOPIC = "test-metadata-topic";
public static void main (String args[]) {
//Setting the Stream configuration params.
final Properties kafkaStreamConfiguration = new Properties();
kafkaStreamConfiguration.put(StreamsConfig.APPLICATION_ID_CONFIG, APPLICATION_ID);
kafkaStreamConfiguration.put(StreamsConfig.CLIENT_ID_CONFIG, APPLICATION_ID);
kafkaStreamConfiguration.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, AUTO_OFFSET_RESET_CONFIG);
kafkaStreamConfiguration.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, KAFKA_BROKERS);
//Creating Serdes for MetricMetadata
GenericJsonSerializer<MetricMetadata> metadataJsonSerializer = new GenericJsonSerializer<MetricMetadata>();
GenericJsonDeserializer<MetricMetadata> metadataJsonDeserializer = new GenericJsonDeserializer<MetricMetadata>(MetricMetadata.class);
Serde<MetricMetadata> metadataSerde = Serdes.serdeFrom(metadataJsonSerializer, metadataJsonDeserializer);
//Creating kafka stream.
final StreamsBuilder builder = new StreamsBuilder();
KTable<String, MetricMetadata> metaTable = builder.table(METADATA_TOPIC, Consumed.with(Serdes.String(), metadataSerde))
.groupBy((key, value) -> KeyValue.pair(value.getEntity_id(), value))
.aggregate( () -> null,
(key, value, aggValue) -> value,
(key, value, aggValue) -> value
);
final KafkaStreams streams = new KafkaStreams(builder.build(), kafkaStreamConfiguration);
streams.start();
}
Once I pushes a message to the topic - METADATA_TOPIC. This results in the below error. Am I missing something here. kafka-streams 2.2.0
Exception in thread "TestMetadataTable-StreamThread-1" org.apache.kafka.streams.errors.ProcessorStateException: task [0_0] Failed to flush state store test-metadata-topic-STATE-STORE-0000000000
at org.apache.kafka.streams.processor.internals.ProcessorStateManager.flush(ProcessorStateManager.java:242)
at org.apache.kafka.streams.processor.internals.AbstractTask.flushState(AbstractTask.java:204)
at org.apache.kafka.streams.processor.internals.StreamTask.flushState(StreamTask.java:519)
at org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:471)
at org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:459)
at org.apache.kafka.streams.processor.internals.AssignedTasks.commit(AssignedTasks.java:286)
at org.apache.kafka.streams.processor.internals.TaskManager.commitAll(TaskManager.java:412)
at org.apache.kafka.streams.processor.internals.StreamThread.maybeCommit(StreamThread.java:1057)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:911)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:805)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:774)
Caused by: org.apache.kafka.streams.errors.StreamsException: A serializer (key: org.apache.kafka.common.serialization.ByteArraySerializer / value: org.apache.kafka.streams.kstream.internals.ChangedSerializer) is not compatible to the actual key or value type (key type: java.lang.String / value type: org.apache.kafka.streams.kstream.internals.Change). Change the default Serdes in StreamConfig or provide correct Serdes via method parameters.
at org.apache.kafka.streams.processor.internals.SinkNode.process(SinkNode.java:94)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:183)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:162)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:122)
at org.apache.kafka.streams.kstream.internals.KTableRepartitionMap$KTableMapProcessor.process(KTableRepartitionMap.java:95)
at org.apache.kafka.streams.kstream.internals.KTableRepartitionMap$KTableMapProcessor.process(KTableRepartitionMap.java:72)
at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:117)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:183)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:162)
at org.apache.kafka.streams.kstream.internals.ForwardingCacheFlushListener.apply(ForwardingCacheFlushListener.java:42)
at org.apache.kafka.streams.state.internals.CachingKeyValueStore.putAndMaybeForward(CachingKeyValueStore.java:102)
at org.apache.kafka.streams.state.internals.CachingKeyValueStore.lambda$initInternal$0(CachingKeyValueStore.java:79)
at org.apache.kafka.streams.state.internals.NamedCache.flush(NamedCache.java:141)
at org.apache.kafka.streams.state.internals.NamedCache.flush(NamedCache.java:99)
at org.apache.kafka.streams.state.internals.ThreadCache.flush(ThreadCache.java:124)
at org.apache.kafka.streams.state.internals.CachingKeyValueStore.flush(CachingKeyValueStore.java:127)
at org.apache.kafka.streams.state.internals.WrappedStateStore.flush(WrappedStateStore.java:72)
at org.apache.kafka.streams.state.internals.MeteredKeyValueStore.flush(MeteredKeyValueStore.java:224)
at org.apache.kafka.streams.processor.internals.ProcessorStateManager.flush(ProcessorStateManager.java:239)
... 10 more
Caused by: java.lang.ClassCastException: class java.lang.String cannot be cast to class [B (java.lang.String and [B are in module java.base of loader 'bootstrap')
at org.apache.kafka.common.serialization.ByteArraySerializer.serialize(ByteArraySerializer.java:21)
at org.apache.kafka.common.serialization.Serializer.serialize(Serializer.java:60)
at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.send(RecordCollectorImpl.java:161)
at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.send(RecordCollectorImpl.java:102)
at org.apache.kafka.streams.processor.internals.SinkNode.process(SinkNode.java:89)
... 28 more
In this case, you need to provide Serdes to the KTable.groupBy() operation via Grouped as calling groupBy triggers a repartition. You'll also need to provide the same Serdes to the aggregate operation for the state store.
Also, since the key is null, I think you should use a KStream initially. Then call groupByKey (you still need to provide Serdes via Grouped), and aggregation will give you the KTable you want.
Off the top of my head, something like this should work
builder.stream((METADATA_TOPIC, Consumed.with(Serdes.String(), metadataSerde))
.selectKey((key, value) -> KeyValue.pair(value.getEntity_id(), value))
.groupByKey(Grouped.with(Serdes.String(), metadataSerde))
.aggregate( () -> null,
(key, value, aggValue) -> value,
(key, value, aggValue) -> value,
Materialized.with(Serdes.String(), metadataSerde)
);

Extracting Timestamp from producer message

I really need help!
I can't extract the timestamp for a message sent by a producer. In my project I work with Json, I have a class in which I define the keys and one in which I define the values ​​of the message that I will send via a producer on a "Raw" topic. I have 2 other classes that do the same thing for the output message that my consumer will read on the topic called "Tdt". In the main class KafkaStreams.java I define the stream and map the keys and values. Starting Kafka locally, I start a producer who writes a message on the "raw" topic with keys and values, then on another shell the consumer starts reading the exit message on the "tdt" topic. How do I get the event timestamp? I need to know the timestamp in which the message was sent by the producer. Do I need a TimestampExtractor?
Here is my main class kafkastreams (my application works great, I just need the timestamp)
#Bean("app1StreamTopology")
public KStream<LibAssIbanRawKey, LibAssIbanRawValue> kStream() throws ParseException {
JsonSerde<Dwsitspr4JoinValue> Dwsitspr4JoinValueSerde = new JsonSerde<>(Dwsitspr4JoinValue.class);
KStream<LibAssIbanRawKey, LibAssIbanRawValue> stream = defaultKafkaStreamsBuilder.stream(inputTopic);
stream.peek((k,v) -> logger.info("Debug3 Chiave descrizione -> ({})",v.getCATRAPP()));
GlobalKTable<Integer, Dwsitspr4JoinValue> categoriaRapporto = defaultKafkaStreamsBuilder
.globalTable(temptiptopicname,
Consumed.with(Serdes.Integer(), Dwsitspr4JoinValueSerde)
// .withOffsetResetPolicy(Topology.AutoOffsetReset.EARLIEST)
);
logger.info("Debug3 Chiave descrizione -> ({})",categoriaRapporto.toString()) ;
stream.peek((k,v) -> logger.info("Debug4 Chiave descrizione -> ({})",v.getCATRAPP()) );
stream
.join(categoriaRapporto, (k, v) -> v.getCATRAPP(), (valueStream, valueGlobalKtable) -> {
// Value mapping
LibAssIbanTdtValue newValue = new LibAssIbanTdtValue();
newValue.setDescrizioneRidottaCodiceCategoriaDelRapporto(valueGlobalKtable.getDescrizioneRidotta());
newValue.setDescrizioneEstesaCodiceCategoriaDelRapporto(valueGlobalKtable.getDescrizioneEstesa());
newValue.setIdentificativo(valueStream.getAUD_CCID());
.
.
.//Other Value Mapped
.
.
.map((key, value) -> {
// Key mapping
LibAssIbanTdtKey newKey = new LibAssIbanTdtKey();
newKey.setData(dtf.format(localDate));
newKey.setIdentificatoreUnivocoDellaRigaDiTabella(key.getTABROWID());
return KeyValue.pair(newKey, value);
}).to(outputTopic, Produced.with(new JsonSerde<>(LibAssIbanTdtKey.class), new JsonSerde<>(LibAssIbanTdtValue.class)));
return stream;
}
}
Yes you need a TimestampExtractor.
public class YourTimestampExtractor implements TimestampExtractor {
#Override
public long extract(ConsumerRecord<Object, Object> consumerRecord, long l) {
// do whatever you want with the timestamp available with consumerRecord.timestamp()
...
// return here the timestamp you want to use (here default)
return consumerRecord.timestamp();
}
}
You'll need to tell kafka stream what extractor to use under the key StreamsConfig.DEFAULT_TIMESTAMP_EXTRACTOR_CLASS_CONFIG

Null value in spark streaming from Kafka

I have a simple program because I'm trying to receive data using kafka. When I start a kafka producer and I send data, for example: "Hello", I get this when I print the message: (null, Hello). And I don't know why this null appears. Is there any way to avoid this null? I think it's due to Tuple2<String, String>, the first parameter, but I only want to print the second parameter. And another thing, when I print that using System.out.println("inside map "+ message); it does not appear any message, does someone know why? Thanks.
public static void main(String[] args){
SparkConf sparkConf = new SparkConf().setAppName("org.kakfa.spark.ConsumerData").setMaster("local[4]");
// Substitute 127.0.0.1 with the actual address of your Spark Master (or use "local" to run in local mode
sparkConf.set("spark.cassandra.connection.host", "127.0.0.1");
// Create the context with 2 seconds batch size
JavaStreamingContext jssc = new JavaStreamingContext(sparkConf, new Duration(2000));
Map<String, Integer> topicMap = new HashMap<>();
String[] topics = KafkaProperties.TOPIC.split(",");
for (String topic: topics) {
topicMap.put(topic, KafkaProperties.NUM_THREADS);
}
/* connection to cassandra */
CassandraConnector connector = CassandraConnector.apply(sparkConf);
System.out.println("+++++++++++ cassandra connector created ++++++++++++++++++++++++++++");
/* Receive kafka inputs */
JavaPairReceiverInputDStream<String, String> messages =
KafkaUtils.createStream(jssc, KafkaProperties.ZOOKEEPER, KafkaProperties.GROUP_CONSUMER, topicMap);
System.out.println("+++++++++++++ streaming-kafka connection done +++++++++++++++++++++++++++");
JavaDStream<String> lines = messages.map(
new Function<Tuple2<String, String>, String>() {
public String call(Tuple2<String, String> message) {
System.out.println("inside map "+ message);
return message._2();
}
}
);
messages.print();
jssc.start();
jssc.awaitTermination();
}
Q1) Null values:
Messages in Kafka are Keyed, that means they all have a (Key, Value) structure.
When you see (null, Hello) is because the producer published a (null,"Hello") value in a topic.
If you want to omit the key in your process, map the original Dtream to remove the key: kafkaDStream.map( new Function<String,String>() {...})
Q2) System.out.println("inside map "+ message); does not print. A couple of classical reasons:
Transformations are applied in the executors, so when running in a cluster, that output will appear in the executors and not on the master.
Operations are lazy and DStreams need to be materialized for operations to be applied.
In this specific case, the JavaDStream<String> lines is never materialized i.e. not used for an output operation. Therefore the map is never executed.

Categories