Process and check event using kafka-streams during some period - java

I have a KStream eventsStream, which is get data from a topic "events".
There is two type of events, their keys:
1. {user_id = X, event_id = 1} {..value, include time_event...}
2. {user_id = X, event_id = 2} {..value, include time_event...}
I need to migrate events with event_id = 1 to a topic "results" if during 10 minutes there is not given an event with event_id = 2 by user.
For example,
1. First case: we get data {user_id = 100, event_id = 1} {.. time_event = xxxx ...} and no events during 10 minutes {user_id = 100, event_id = 2} {.. time_event = xxxx + 10 minutes...}, so we'll write it to results-topic
2. Second case: we get data {user_id = 100, event_id = 1} {.. time_event = xxxx ...} and an event during 10 minutes {user_id = 100, event_id = 2} {.. time_event = xxxx + 5 minutes...}, so we'll not write it to results-topic
How does it possible to realise in java code this behavior using kafka-streams?
My code:
public class ResultStream {
public static KafkaStreams newStream() {
Properties properties = Config.getProperties("ResultStream");
Serde<String> stringSerde = Serdes.String();
StreamsBuilder builder = new StreamsBuilder();
StoreBuilder<KeyValueStore<String, String>> store =
Stores.keyValueStoreBuilder(
Stores.inMemoryKeyValueStore("inmemory"),
stringSerde,
stringSerde
);
builder.addStateStore(store);
KStream<String, String> resourceEventStream = builder.stream(EVENTS.topicName(), Consumed.with(stringSerde, stringSerde));
resourceEventStream.print(Printed.toSysOut());
resourceEventStream.process(() -> new CashProcessor("inmemory"), "inmemory");
resourceEventStream.process(() -> new FilterProcessor("inmemory", resourceEventStream), "inmemory");
Topology topology = builder.build();
return new KafkaStreams(topology, properties);
}
}
public class FilterProcessor implements Processor {
private ProcessorContext context;
private String eventStoreName;
private KeyValueStore<String, String> eventStore;
private KStream<String, String> stream;
public FilterProcessor(String eventStoreName, KStream<String, String> stream) {
this.eventStoreName = eventStoreName;
this.stream = stream;
}
#Override
public void init(ProcessorContext processorContext) {
this.context = processorContext;
eventStore = (KeyValueStore) processorContext.getStateStore(eventStoreName);
}
#Override
public void process(Object key, Object value) {
this.context.schedule(Duration.ofMinutes(1), PunctuationType.WALL_CLOCK_TIME, timestamp -> {
System.out.println("Scheduler is working");
stream.filter((k, v) -> {
JsonObject events = new Gson().fromJson(k, JsonObject.class);
if (***condition***) {
return true;
}
return false;
}).to("results");
});
}
#Override
public void close() {
}
}
CashProcessor's role only to put events to local store, and delete record with event_id = 1 by user if there is given an event_id = 2 with the same user.
FilterProcess should filter events using local store every minute. But I can't invoke correctly this processing (as I do it in fact)...
I'm really need help.

Why do you pass KStream into your processor? That is not how the DSL works.
As you "connect" your processors via resourceEventStream.process() already, your FilterProcessor#process(key, value) method will be called for each record in the stream automatically -- however, a KStream#process() is a terminal operation and thus does not allow you to send any data downstream. Instead, you might want to use transform() (that is basically the same as process() plus an output KStream).
To actually forward data downstream in your punctuation, you should use context.forward() using the ProcessorContext that is provided via init() method.

Related

RMapCache onCreated listener calls multiple times

In springboot application i am trying to add the data in redis using redission.
Below is the sample code for adding data to redis.
RMapCache<String, String> map = redisson.getMapCache("cacheName");
if (value!= null && map != null) {
map.addListener( new EntryCreatedListener<String, String>() {
#Override
public void onCreated(EntryEvent<String, String> event) {
RKeys rkeys = redisson.getKeys();
long ttl = rkeys.remainTimeToLive(event.getKey());
System.out.println("on created key ", event.getKey(), ttl ,map.remainTimeToLive());
}
});
map.putIfAbsent(key, value, ttl, TimeUnit.SECONDS);
}
}
the redisson version used is 3.13.1
output
the print statement is printing multiple times.

ProcessorContext schedule only executed once

I'm using Apache Kafka Stream where I added a transform in my stream
final StreamsBuilder streamsBuilder = new StreamsBuilder();
final StoreBuilder<KeyValueStore<String, byte[]>> correlationStore =
Stores.keyValueStoreBuilder(
Stores.persistentKeyValueStore(STORE_NAME),
Serdes.String(),
Serdes.ByteArray());
streamsBuilder.addStateStore(correlationStore);
streamsBuilder.stream(topicName, inputConsumed)
.peek(InboundPendingMessageStreamer::logEntries)
.transform(() -> new CleanerTransformer<String, byte[], KeyValue<String, byte[]>>(Duration.ofMillis(5000), STORE_NAME), STORE_NAME)
.toTable();
I'm having difficulties to understand the CleanerTransformer Transformer class that I create, where in the init method, I set a schedule with a scanFrequency and a PunctuationType.
#Override
public void init(ProcessorContext context) {
this.stateStore = context.getStateStore(purgeStoreName);
context.schedule(scanFrequency, PunctuationType.STREAM_TIME, timestamp -> {
try (final KeyValueIterator<K, byte[]> all = stateStore.all()) {
while (all.hasNext()) {
final var headers = context.headers();
final KeyValue<K, byte[]> record = all.next();
}
}
});
}
Adding an event in the stream, I got the message in the schedule callback, but it's only executed once.
My understanding was, that it should be executed every time configured in the scanFrequency.
Any idea what I'm doing wrong here?

Many-to-One records Kafka Streams

I would like to turn many records into one per message. I tried many things like custom reducing and aggregators, but they all still send one-to-one records back out. For example I would like to convert many strings into just one string. If my stream is messages with the same key, but different values, "the", "sky", "is", "blue", then I would like to outback back one concatenation of them in a new topic "the,sky,is,blue,". What I am instead getting is 4 messages "the,", "the, sky,", "the,sky, is,", "the,sky,is,blue,". When I send a second message to the kafka consumer, it will concatenate on the previous aggregation and I eventually receive this "the,sky,is,blue,the,sky,is,blue,"
I also tried using a custom storebuilder and changing a lot of the settings to see if that would do anything.
Map<String, String> changelogConfig = new HashMap<>();
changelogConfig.put("message.down.conversion.enable", "true");
changelogConfig.put("flush.messages", "0");
changelogConfig.put("flush.ms", "0");
StoreBuilder<KeyValueStore<String, String>> aggStoreSupplier = Stores.keyValueStoreBuilder(
Stores.persistentKeyValueStore("AggStore"),
Serdes.String(),
Serdes.String())
.withLoggingEnabled(changelogConfig);
KStream<String, String> results = source // single message get processed and eventually i get these string results I need to concatenate
.groupByKey() // this kgroupedstream has the N records, which was how many were sent in the message
.reduce(new Reducer<String>() {
#Override
public String apply(String aggValue, String value) {
return value + "," + aggValue;
}
}, Materialized.as("AggStore"))
.toStream();
results.to("results", Produced.with(Serdes.String(), Serdes.String()));
final Topology topology = builder.build(); // to describe topology
System.out.println(topology.describe()); // to print description
final KafkaStreams streams = new KafkaStreams(topology, props);
final CountDownLatch latch = new CountDownLatch(1);
// attach shutdown handler to catch control-c
Runtime.getRuntime().addShutdownHook(new Thread("streams-shutdown-hook") {
#Override
public void run() {
streams.close();
latch.countDown();
}
});
try {
streams.cleanUp();
streams.start();
latch.await();
} catch (Throwable e) {
System.exit(1);
}
System.exit(0);

Kafka streams KTable creation from a given topic

I'm doing a project and I'm stuck on the KTable.
I want to take records from a topic and put them in a KTable(store), so that I have 1 record for 1 key.
static KafkaStreams streams;
final Serde<Long> longSerde = Serdes.Long();
final Serde<byte[]> byteSerde = Serdes.ByteArray();
static String topicName;
static String storeName;
final StreamsBuilder builder = new StreamsBuilder();
KStream<Long, byte[]> streamed = builder.stream(topicName, Consumed.with(longSerde, byteSerde));
KTable<Long, byte[]> records = streamed.groupByKey().reduce(
new Reducer<Long>() {
#Override
public Long apply(Long aggValue, Long newValue) {
return newValue;
}
},
storeName);
This is the closest I got to the answer I think.
Your approach is correct, but you need to use the correct serdes.
In .reduce() function, value type should be byte[].
KStream<Long, byte[]> streamed = builder.stream(topicName, Consumed.with(longSerde, byteSerde));
KTable<Long, byte[]> records = streamed.groupByKey().reduce(
new Reducer<byte[]>() {
#Override
public byte[] apply(byte[] aggValue, byte[] newValue) {
return newValue;
}
},
Materialized.as(storename).with(longSerde,byteSerde));

Can't get results from flink SQL query

I'm facing a problem in which I don't get results from my query in Flink-SQL.
I have some informations stored in two Kafka Topics, I want to store them in two tables and perform a join between them in a streaming way.
These are my flink instructions :
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
StreamTableEnvironment tableEnv = TableEnvironment.getTableEnvironment(env);
// configure Kafka consumer
Properties props = new Properties();
props.setProperty("bootstrap.servers", "localhost:9092"); // Broker default host:port
props.setProperty("group.id", "flink-consumer"); // Consumer group ID
FlinkKafkaConsumer011<Blocks> flinkBlocksConsumer = new FlinkKafkaConsumer011<>(args[0], new BlocksSchema(), props);
flinkBlocksConsumer.setStartFromEarliest();
FlinkKafkaConsumer011<Transactions> flinkTransactionsConsumer = new FlinkKafkaConsumer011<>(args[1], new TransactionsSchema(), props);
flinkTransactionsConsumer.setStartFromEarliest();
DataStream<Blocks> blocks = env.addSource(flinkBlocksConsumer);
DataStream<Transactions> transactions = env.addSource(flinkTransactionsConsumer);
tableEnv.registerDataStream("blocksTable", blocks);
tableEnv.registerDataStream("transactionsTable", transactions);
Here is my SQL query :
Table sqlResult
= tableEnv.sqlQuery(
"SELECT block_timestamp,count(tx_hash) " +
"FROM blocksTable " +
"JOIN transactionsTable " +
"ON blocksTable.block_hash=transactionsTable.tx_hash " +
"GROUP BY blocksTable.block_timestamp");
DataStream<Test> resultStream = tableEnv
.toRetractStream(sqlResult,Row.class)
.map(t -> {
Row r = t.f1;
String field2 = r.getField(0).toString();
long count = Long.valueOf(r.getField(1).toString());
return new Test(field2,count);
})
.returns(Test.class);
Then, I print the results :
resultStream.print();
But I don't get any answers, my program is stuck...
For the schema used for serialization and deserialization, here is my test class which stores the result of my query (two fields a string and a long for respectively the block_timestamp and the count) :
public class TestSchema implements DeserializationSchema<Test>, SerializationSchema<Test> {
#Override
public Test deserialize(byte[] message) throws IOException {
return Test.fromString(new String(message));
}
#Override
public boolean isEndOfStream(Test nextElement) {
return false;
}
#Override
public byte[] serialize(Test element) {
return element.toString().getBytes();
}
#Override
public TypeInformation<Test> getProducedType() {
return TypeInformation.of(Test.class);
}
}
This is the same principle for BlockSchema and TransactionsSchema classes.
Do you know why I can't get the result of my query ? Should I test with BatchExecutionEnvironment ?

Categories