How to process and aggregate Kafka Streams with custom Objects? - java

So basically I have Account class. I have data. I want to send those objects into my topic with producer. That is okay for now. Later on, I want to do aggregation with Kafka Streams but I can not because some Serde properties is wrong in my configuration, I think :/. I dont know where the error is. My producer works fine, but I can't aggregate. Anyone help me to look my kafka streams code please?
My Account class:
public class Account {
private long fromId;
private long amount;
private long toId;
private ZonedDateTime time;
}
There 2 classes Serializer and Deserializer for my Account class. Serializer:
public class AccountSerializer implements Serializer {
private static final Charset CHARSET = Charset.forName("UTF-8");
static private Gson gson = new Gson();
#Override
public void configure(Map map, boolean b) {
}
#Override
public byte[] serialize(String s, Object o) {
String line = gson.toJson(o);
// Return the bytes from the String 'line'
return line.getBytes(CHARSET);
}
#Override
public void close() {
}
}
Deserializer:
public class AccountDeserializer implements Deserializer {
private static final Charset CHARSET = Charset.forName("UTF-8");
static private Gson gson;
static {
gson = new Gson();
}
#Override
public void configure(Map map, boolean b) {
}
#Override
public Object deserialize(String s, byte[] bytes) {
try {
// Transform the bytes to String
String person = new String(bytes, CHARSET);
// Return the Person object created from the String 'person'
return gson.fromJson(person, Account.class);
} catch (Exception e) {
throw new IllegalArgumentException("Error reading bytes! Yanlış", e);
}
}
#Override
public void close() {
}
}
My AccountSerde class for kafka streams:
public class AccountSerde implements Serde<Object> {
private AccountSerializer accountSerializer;
private AccountDeserializer accountDeserializer;
#Override
public void configure(Map<String, ?> map, boolean b) {
}
#Override
public void close() {
accountSerializer.close();
accountDeserializer.close();
}
#Override
public Serializer<Object> serializer() {
return accountSerializer;
}
#Override
public Deserializer<Object> deserializer() {
return accountDeserializer;
}
}
And my Kafka Producer:
public static void main(String[] args) {
DataAccess dataAccess = new DataAccess();
List<Account> accountList = dataAccess.read();
final Logger logger = LoggerFactory.getLogger(Producer.class);
Properties properties = new Properties();
properties.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,"127.0.0.1:9092");
properties.setProperty(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,LongSerializer.class.getName());
properties.setProperty(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,AccountSerializer.class.getName());
KafkaProducer<Long,Account> producer = new KafkaProducer<>(properties);
for (Account account : accountList) {
ProducerRecord<Long,Account> record = new ProducerRecord<Long, Account>("bank_account",account.getFromId(),account);
producer.send(record, new Callback() {
public void onCompletion(RecordMetadata recordMetadata, Exception e) {
if (e == null) {
logger.info("Record sent successfully. \n "+ "Topic : "+recordMetadata.topic() +"\n"+
"Partition : " + recordMetadata.partition() + "\n"+
"Offset : " +recordMetadata.offset() +"\n"+
"Timestamp: " +recordMetadata.timestamp() +"\n");
try {
Thread.sleep(1000);
} catch (InterruptedException e1) {
e1.printStackTrace();
}
}
else{
logger.info("Error sending producer");
}
}
});
}
producer.flush();
producer.close();
}
And here is class where I want to try aggregation, my Kafka Stream class.
public static void main(String[] args) {
System.out.println();
Properties properties = new Properties();
properties.setProperty(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG,"127.0.01:9092");
properties.setProperty(StreamsConfig.APPLICATION_ID_CONFIG,"demo-kafka-streams");
properties.setProperty(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,AccountDeserializer.class.getName());
properties.setProperty(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.LongSerde);
properties.setProperty(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, AccountSerde.class.getName());
//create a topology
StreamsBuilder streamsBuilder = new StreamsBuilder();
KStream<Long, Account> inputTopic = streamsBuilder.stream("bank_account");
KTable<Long, Long> aggregate = inputTopic.groupByKey().aggregate(
() -> 0L,
(key, current, oldBalance) -> current.getAmount() + oldBalance);
aggregate.toStream().to("son");
KafkaStreams streams = new KafkaStreams(streamsBuilder.build(),properties);
streams.start();
System.out.println(streams.toString());
Runtime.getRuntime().addShutdownHook(new Thread(streams::close));
}
I tried my producer is working fine and sends objects. However because of error I can't try whether my aggregation code is working or not. It gives me
[demo-kafka-streams-9e3b0ab8-c021-4707-bf85-174e1356ea89-StreamThread-1] ERROR org.apache.kafka.streams.errors.LogAndFailExceptionHandler - Exception caught during Deserialization, taskId: 0_0, topic: bank_account, partition: 0, offset: 0
java.lang.NullPointerException
at org.apache.kafka.streams.processor.internals.SourceNode.deserializeValue(SourceNode.java:63)
at org.apache.kafka.streams.processor.internals.RecordDeserializer.deserialize(RecordDeserializer.java:66)
at org.apache.kafka.streams.processor.internals.RecordQueue.addRawRecords(RecordQueue.java:97)
at org.apache.kafka.streams.processor.internals.PartitionGroup.addRawRecords(PartitionGroup.java:117)
at org.apache.kafka.streams.processor.internals.StreamTask.addRecords(StreamTask.java:638)
at org.apache.kafka.streams.processor.internals.StreamThread.addRecordsToTasks(StreamThread.java:936)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:831)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:767)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:736)
[demo-kafka-streams-9e3b0ab8-c021-4707-bf85-174e1356ea89-StreamThread-1] INFO org.apache.kafka.streams.processor.internals.StreamThread - stream-thread [demo-kafka-streams-9e3b0ab8-c021-4707-bf85-174e1356ea89-StreamThread-1] State transition from RUNNING to PENDING_SHUTDOWN
[demo-kafka-streams-9e3b0ab8-c021-4707-bf85-174e1356ea89-StreamThread-1] INFO org.apache.kafka.streams.processor.internals.StreamThread - stream-thread [demo-kafka-streams-9e3b0ab8-c021-4707-bf85-174e1356ea89-StreamThread-1] Shutting down
[demo-kafka-streams-9e3b0ab8-c021-4707-bf85-174e1356ea89-StreamThread-1] INFO org.apache.kafka.clients.producer.KafkaProducer - [Producer clientId=demo-kafka-streams-9e3b0ab8-c021-4707-bf85-174e1356ea89-StreamThread-1-producer] Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms.
[demo-kafka-streams-9e3b0ab8-c021-4707-bf85-174e1356ea89-StreamThread-1] INFO org.apache.kafka.streams.processor.internals.StreamThread - stream-thread [demo-kafka-streams-9e3b0ab8-c021-4707-bf85-174e1356ea89-StreamThread-1] State transition from PENDING_SHUTDOWN to DEAD
[demo-kafka-streams-9e3b0ab8-c021-4707-bf85-174e1356ea89-StreamThread-1] INFO org.apache.kafka.streams.KafkaStreams - stream-client [demo-kafka-streams-9e3b0ab8-c021-4707-bf85-174e1356ea89] State transition from RUNNING to ERROR
[demo-kafka-streams-9e3b0ab8-c021-4707-bf85-174e1356ea89-StreamThread-1] WARN org.apache.kafka.streams.KafkaStreams - stream-client [demo-kafka-streams-9e3b0ab8-c021-4707-bf85-174e1356ea89] All stream threads have died. The instance will be in error state and should be closed.
[demo-kafka-streams-9e3b0ab8-c021-4707-bf85-174e1356ea89-StreamThread-1] INFO org.apache.kafka.streams.processor.internals.StreamThread - stream-thread [demo-kafka-streams-9e3b0ab8-c021-4707-bf85-174e1356ea89-StreamThread-1] Shutdown complete
Exception in thread "demo-kafka-streams-9e3b0ab8-c021-4707-bf85-174e1356ea89-StreamThread-1" org.apache.kafka.streams.errors.StreamsException: Deserialization exception handler is set to fail upon a deserialization error. If you would rather have the streaming pipeline continue after a deserialization error, please set the default.deserialization.exception.handler appropriately.
at org.apache.kafka.streams.processor.internals.RecordDeserializer.deserialize(RecordDeserializer.java:80)
at org.apache.kafka.streams.processor.internals.RecordQueue.addRawRecords(RecordQueue.java:97)
at org.apache.kafka.streams.processor.internals.PartitionGroup.addRawRecords(PartitionGroup.java:117)
at org.apache.kafka.streams.processor.internals.StreamTask.addRecords(StreamTask.java:638)
at org.apache.kafka.streams.processor.internals.StreamThread.addRecordsToTasks(StreamThread.java:936)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:831)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:767)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:736)
Caused by: java.lang.NullPointerException
at org.apache.kafka.streams.processor.internals.SourceNode.deserializeValue(SourceNode.java:63)
at org.apache.kafka.streams.processor.internals.RecordDeserializer.deserialize(RecordDeserializer.java:66)
... 7 more

You're never initializing the fields, so you're getting a NPE
You should also change the Serde type to your actual class
public class AccountSerde implements Serde<Account> {
// These are both null unless you initialize them
private AccountSerializer accountSerializer;
private AccountDeserializer accountDeserializer;
Also, you'll need to fix your IP address from this value, which is not a valid IP
"127.0.01:9092"

Related

Got a java.lang.IllegalArgumentException when sending a Java object by the dynamic TCP/IP integration flow?

Gary Russell helped me some time ago with the following 'DynamicTcpServer' flow (see Building a TCP/IP server with SI's dynamic flow registration) having now a message service injected which gets the message to send as soon as a client connects:
public class DynamicTcpServer implements TcpServer {
#Autowired
private IntegrationFlowContext flowContext;
#Autowired
private ApplicationContext appContext;
private final Map<String, IntegrationFlowRegistration> registrations = new HashMap<>();
private final Map<String, String> clients = new ConcurrentHashMap<>();
private final Map<String, TcpServerSpec> sockets;
private final MessageService messenger;
#Autowired
public DynamicTcpServer(MessageService messenger, Map<String, TcpServerSpec> sockets) {
this.messenger = messenger;
this.sockets = sockets;
}
#Override
public void start(String context) {
start(context, sockets.get(context).getPort());
}
#Override
public void start(String context, int port) {
if (this.registrations.containsKey(context)) {
/* already running */
}
else {
TcpServerConnectionFactorySpec server = Tcp.netServer(port).id(context).serializer(TcpCodecs.lf());
server.get().registerListener(msg -> false); // dummy listener so the accept thread doesn't exit
IntegrationFlow flow = f -> f.handle(Tcp.outboundAdapter(server));
this.registrations.put(context, flowContext.registration(flow).register());
}
}
#Override
public Set<String> running() {
return registrations.keySet();
}
#Override
public void stop(String context) {
IntegrationFlowRegistration registration = this.registrations.remove(context);
if (registration != null) {
registration.destroy();
}
}
#EventListener
public void connect(TcpConnectionOpenEvent event) {
String connectionId = event.getConnectionId();
this.clients.put(connectionId, event.getConnectionFactoryName());
}
#EventListener
public void closed(TcpConnectionCloseEvent event) {
this.clients.remove(event.getConnectionId());
}
#EventListener
public void listening(TcpConnectionServerListeningEvent event) {
}
#Scheduled(
fixedDelayString = "${com.harry.potter.scheduler.fixed-delay}",
initialDelayString = "${com.harry.potter.scheduler.initial-delay}"
)
public void sender() {
this.clients.forEach((connectId, context) -> {
IntegrationFlowRegistration register = registrations.get(context);
if (register != null) {
try {
while (true) {
List<ServerMessage> msgs = messenger.getMessagesToSend(sockets.get(context));
msgs.stream().forEach(msg ->
register.getMessagingTemplate().send(
MessageBuilder.withPayload(msg)
.setHeader(IpHeaders.CONNECTION_ID, connectId).build()));
}
}
catch (NoMessageToSendException nm) {
appContext.getBean(context, TcpNetServerConnectionFactory.class)
.closeConnection(connectId);
}
}
});
}
}
The message service returns a Java object 'com.harry.potter.entity.ServerMessage' to be sent.
So I assume I have to add some other kind of converter at '.serializer(TcpCodecs.lf())' because I got an exception saying:
2022-04-17 04:00:45.729 DEBUG [] --- [pool-283-thread-1] c.l.c.c.cas.service.DynamicTcpServer : sender: send 1 messages to potter1
2022-04-17 04:00:45.738 DEBUG [] --- [pool-283-thread-1] c.l.c.c.c.service.DynamicTcpServer : closed event=TcpConnectionCloseEvent [source=TcpNetConnection:harry.potter.de:56746:17584:76adefe0-0881-4e4b-be2b-0ced47f950ae], [factory=potter1, connectionId=harry.potter.de:56746:17584:76adefe0-0881-4e4b-be2b-0ced47f950ae] **CLOSED**
2022-04-17 04:00:45.740 ERROR [] --- [pool-283-thread-1] o.s.i.ip.tcp.TcpSendingMessageHandler : Error sending message
org.springframework.messaging.MessagingException: Send Failed; nested exception is java.lang.IllegalArgumentException: When using a byte array serializer, the socket mapper expects either a byte array or String payload, but received: class com.harry.potter.entity.ServerMessage
at org.springframework.integration.ip.tcp.connection.TcpNetConnection.send(TcpNetConnection.java:118)
at org.springframework.integration.ip.tcp.TcpSendingMessageHandler.handleMessageAsServer(TcpSendingMessageHandler.java:119)
at org.springframework.integration.ip.tcp.TcpSendingMessageHandler.handleMessageInternal(TcpSendingMessageHandler.java:103)
at org.springframework.integration.handler.AbstractMessageHandler.handleMessage(AbstractMessageHandler.java:62)
at org.springframework.integration.dispatcher.AbstractDispatcher.tryOptimizedDispatch(AbstractDispatcher.java:115)
at org.springframework.integration.dispatcher.UnicastingDispatcher.doDispatch(UnicastingDispatcher.java:133)
at org.springframework.integration.dispatcher.UnicastingDispatcher.dispatch(UnicastingDispatcher.java:106)
at org.springframework.integration.channel.AbstractSubscribableChannel.doSend(AbstractSubscribableChannel.java:72)
at org.springframework.integration.channel.AbstractMessageChannel.send(AbstractMessageChannel.java:570)
at org.springframework.integration.channel.AbstractMessageChannel.send(AbstractMessageChannel.java:520)
at org.springframework.messaging.core.GenericMessagingTemplate.doSend(GenericMessagingTemplate.java:187)
at org.springframework.messaging.core.GenericMessagingTemplate.doSend(GenericMessagingTemplate.java:166)
at org.springframework.messaging.core.GenericMessagingTemplate.doSend(GenericMessagingTemplate.java:47)
at org.springframework.messaging.core.AbstractMessageSendingTemplate.send(AbstractMessageSendingTemplate.java:109)
at org.springframework.messaging.core.AbstractMessageSendingTemplate.send(AbstractMessageSendingTemplate.java:99)
at com.harry.potter.service.DynamicTcpServer.lambda$sender$2(DynamicTcpServer.java:125)
at com.harry.potter.service.DynamicTcpServer$$Lambda$40600/0x000000006f511b08.accept(Unknown Source)
at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625)
at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:762)
at com.harry.potter.service.DynamicTcpServer.lambda$sender$3(DynamicTcpServer.java:124)
at com.harry.potter.service.DynamicTcpServer$$Lambda$40552/0x000000003344f3b0.accept(Unknown Source)
at java.base/java.util.concurrent.ConcurrentHashMap.forEach(ConcurrentHashMap.java:1603)
at com.harry.potter.service.DynamicTcpServer.sender(DynamicTcpServer.java:115)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
at org.springframework.scheduling.support.ScheduledMethodRunnable.run(ScheduledMethodRunnable.java:84)
at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:884)
Caused by: java.lang.IllegalArgumentException: When using a byte array serializer, the socket mapper expects either a byte array or String payload, but received: class com.harry.potter.entity.ServerMessage
at org.springframework.integration.ip.tcp.connection.TcpMessageMapper.getPayloadAsBytes(TcpMessageMapper.java:277)
at org.springframework.integration.ip.tcp.connection.TcpMessageMapper.fromMessage(TcpMessageMapper.java:252)
at org.springframework.integration.ip.tcp.connection.TcpNetConnection.send(TcpNetConnection.java:111)
... 34 common frames omitted
Which converter (serializer) do I have to use and how to plug it in my DynamicTcpServer exactly?
EDIT 1
The message service messenger returns a Java object 'com.harry.potter.entity.ServerMessage' to be sent. The ServerMessage contains an int field holding the message length and a String field holding the message text:
public class ServerMessage implements Serializable {
private static final long serialVersionUID = -1L;
private int len;
private String message;
/* getters & setters */
}
I am trying to migrate from a C/C++ function which writes the C Struct
struct C_MSG
{
int len; /* Length field */
char text[MAX_MSG_LEN]; /* Data field */
} c_msg;
to a consumer using the C Socket library send function writing a given number of bytes (length of text + 4) from the given memory address to the given TCP/IP socket.
I am looking for a Transformer to prepare the same binary content for the message consumer. Otherwise the consumer will not be able to cope with the message.
Following the comments and looking at the GenericTransformer<S, T> the transformation could be done in a single Lambda expression. The source of the transformation would be an object of the ServerMessage? The result should be an array of Bytes using Spring's utility:
.transform(s -> SerializationUtils.serialize(s))
Will the Lambda expression be that one? Perhaps do I need a custom Transformer - to have more control over the serializing process in case my consumer expects Intel resp. Motorola byte order - with implementing a specific interface? Which one? Perhaps there is a much easier solution?
You need to think about .transform() before that .handle(Tcp.outboundAdapter(server)); to convert your ServerMessage to byte[] or String. That's what is expected in the TcpMessageMapper by default.
Of course I could recommend you to look into the mapper(TcpMessageMapper mapper) option of the Tcp.netServer() and its bytesMessageMapper property, but the outcome would be just the same.

Spring Kafka: Close the container and read the messages from specific offset with ConcurrentKafkaListenerContainerFactory

In my spring kafka application, I want to trigger the consumer at run time according to input of some scheduler. Scheduler will tell the listener from which topic it can start consuming messages. There is springboot application with custom ConcurrentKafkaListenerContainerFactory class. I need to perform three tasks:
close the container, After successfully reading all the messages available on topic.
It will store the current offset in DB or file system.
Next time when consumer up again, the stored offset can be used to process the records instead of default offset managed by Kafka. So that in future we can change the offset value in DB and get get desired reports.
I know how to handle all these with #KafkaListener but not sure how to hook with ConcurrentKafkaListenerContainerFactory. The current code is listed below:
#SpringBootApplication
public class KafkaApp{
public static void main(String[] args) {
SpringApplication.run(KafkaApp.class, args);
}
#Bean
public NewTopic topic() {
return TopicBuilder.name("testTopic").partitions(1).replicas(1).build();
}
}
#Component
class Listener {
private static final Logger log = LoggerFactory.getLogger(Listener.class);
private static final Method otherListen;
static {
try {
otherListen = Listener.class.getDeclaredMethod("otherListen", List.class);
}
catch (NoSuchMethodException | SecurityException ex) {
throw new IllegalStateException(ex);
}
}
private final ConcurrentKafkaListenerContainerFactory<String, String> factory;
private final MessageHandlerMethodFactory methodFactory;
private final KafkaAdmin admin;
private final KafkaTemplate<String, String> template;
public Listener(ConcurrentKafkaListenerContainerFactory<String, String> factory, KafkaAdmin admin,
KafkaTemplate<String, String> template, KafkaListenerAnnotationBeanPostProcessor<?, ?> bpp) {
this.factory = factory;
this.admin = admin;
this.template = template;
this.methodFactory = bpp.getMessageHandlerMethodFactory();
}
#KafkaListener(id = "myId", topics = "testTopic")
public void listen(String topicName) {
try (AdminClient client = AdminClient.create(this.admin.getConfigurationProperties())) {
NewTopic topic = TopicBuilder.name(topicName).build();
client.createTopics(List.of(topic)).all().get(10, TimeUnit.SECONDS);
}
catch (Exception e) {
log.error("Failed to create topic", e);
}
ConcurrentMessageListenerContainer<String, String> container =
this.factory.createContainer(new TopicPartitionOffset(topicName, 0));
BatchMessagingMessageListenerAdapter<String, String> adapter =
new BatchMessagingMessageListenerAdapter<>(this, otherListen);
adapter.setHandlerMethod(new HandlerAdapter(
this.methodFactory.createInvocableHandlerMethod(this, otherListen)));
FilteringBatchMessageListenerAdapter<String, String> filtered =
new FilteringBatchMessageListenerAdapter<>(adapter, record -> !record.key().equals("foo"));
container.getContainerProperties().setMessageListener(filtered);
container.getContainerProperties().setGroupId("group.for." + topicName);
container.setBeanName(topicName + ".container");
container.start();
IntStream.range(0, 10).forEach(i -> this.template.send(topicName, 0, i % 2 == 0 ? "foo" : "bar", "test" + i));
}
void otherListen(List<String> others) {
log.info("Others: {}", others);
}
}
EDIT
#SpringBootApplication
public class KafkaApp{
public static void main(String[] args) {
SpringApplication.run(KafkaApp.class, args);
}
#Bean
public NewTopic topic() {
return TopicBuilder.name("testTopic").partitions(1).replicas(1).build();
}
}
#Component
class Listener {
private static final Logger log = LoggerFactory.getLogger(Listener.class);
private static final Method otherListen;
static {
try {
otherListen = Listener.class.getDeclaredMethod("otherListen", List.class);
}
catch (NoSuchMethodException | SecurityException ex) {
throw new IllegalStateException(ex);
}
}
private final ConcurrentKafkaListenerContainerFactory<String, String> factory;
private final MessageHandlerMethodFactory methodFactory;
private final KafkaAdmin admin;
private final KafkaTemplate<String, String> template;
public Listener(ConcurrentKafkaListenerContainerFactory<String, String> factory, KafkaAdmin admin,
KafkaTemplate<String, String> template, KafkaListenerAnnotationBeanPostProcessor<?, ?> bpp) {
this.factory = factory;
this.admin = admin;
this.template = template;
this.methodFactory = bpp.getMessageHandlerMethodFactory();
}
#KafkaListener(id = "myId", topics = "testTopic")
public void listen(String topicName) {
try (AdminClient client = AdminClient.create(this.admin.getConfigurationProperties())) {
NewTopic topic = TopicBuilder.name(topicName).build();
client.createTopics(List.of(topic)).all().get(10, TimeUnit.SECONDS);
}
catch (Exception e) {
log.error("Failed to create topic", e);
}
ConcurrentMessageListenerContainer<String, String> container =
this.factory.createContainer(new TopicPartitionOffset(topicName, 0));
BatchMessagingMessageListenerAdapter<String, String> adapter =
new BatchMessagingMessageListenerAdapter<>(this, otherListen);
adapter.setHandlerMethod(new HandlerAdapter(
this.methodFactory.createInvocableHandlerMethod(this, otherListen)));
FilteringBatchMessageListenerAdapter<String, String> filtered =
new FilteringBatchMessageListenerAdapter<>(adapter, record -> !record.key().equals("foo"));
container.getContainerProperties().setMessageListener(filtered);
container.getContainerProperties().setGroupId("group.for." + topicName);
container.setBeanName(topicName + ".container");
container.getContainerProperties().setIdleEventInterval(3000L);
container.start();
IntStream.range(0, 10).forEach(i -> this.template.send(topicName, 0, i % 2 == 0 ? "foo" : "bar", "test" + i));
}
void otherListen(List<String> others) {
log.info("Others: {}", others);
}
#EventListener
public void eventHandler(ListenerContainerIdleEvent event) {
logger.info("No messages received for " + event.getIdleTime() + " milliseconds");
}
}
You can receive ListenerContainerIdleEvents when there are no messages left to process; you can use this event to stop the container; you should perform the stop() on a different thread (not the one that publishes the event).
See How to check if Kafka is empty using Spring Kafka?
You can get the partition/offset in several ways.
void otherListen<List<ConsumerRecord<..., ...>>)
or
void otherListen(List<String> others,
#Header(KafkaHeaders.RECEIVED_PARTITION) List<Integer> partitions,
#Header(KafkaHeaders.OFFSET) List<Long> offsets)
You can specify the starting offset in the
new TopicPartitionOffset(topicName, 0), startOffset);
when creating the container.
EDIT
To stop the container when it is idle, set the idleEventInterval and add an #EventListener method and stop the container.
TaskExecutor exec = new SimpleAsyncTaskExecutor();
#EventListener
void idle(ListenerContainerIdleEvent event) {
log...
this.exec.execute(() -> event.getContainer(ConcurrentMessageListenerContainer.class).stop());
}
If you add concurrency to your containers, you would need for each child container to go idle before stopping the parent container.
EDIT2
I just added it to the code I wrote for the answer to your other question and it works exactly as expected.
#KafkaListener(id = "so69134055", topics = "so69134055")
public void listen(String topicName) {
try (AdminClient client = AdminClient.create(this.admin.getConfigurationProperties())) {
NewTopic topic = TopicBuilder.name(topicName).build();
client.createTopics(List.of(topic)).all().get(10, TimeUnit.SECONDS);
}
catch (Exception e) {
log.error("Failed to create topic", e);
}
ConcurrentMessageListenerContainer<String, String> container =
this.factory.createContainer(new TopicPartitionOffset(topicName, 0));
BatchMessagingMessageListenerAdapter<String, String> adapter =
new BatchMessagingMessageListenerAdapter<>(this, otherListen);
adapter.setHandlerMethod(new HandlerAdapter(
this.methodFactory.createInvocableHandlerMethod(this, otherListen)));
FilteringBatchMessageListenerAdapter<String, String> filtered =
new FilteringBatchMessageListenerAdapter<>(adapter, record -> !record.key().equals("foo"));
container.getContainerProperties().setMessageListener(filtered);
container.getContainerProperties().setGroupId("group.for." + topicName);
container.getContainerProperties().setIdleEventInterval(3000L);
container.setBeanName(topicName + ".container");
container.start();
IntStream.range(0, 10).forEach(i -> this.template.send(topicName, 0, i % 2 == 0 ? "foo" : "bar", "test" + i));
}
void otherListen(List<String> others) {
log.info("Others: {}", others);
}
TaskExecutor exec = new SimpleAsyncTaskExecutor();
#EventListener
public void idle(ListenerContainerIdleEvent event) {
log.info(event.toString());
this.exec.execute(() -> {
ConcurrentMessageListenerContainer container = event.getContainer(ConcurrentMessageListenerContainer.class);
log.info("stopping container: " + container.getBeanName());
container.stop();
});
}
[foo.container-0-C-1] Others: [test0, test2, test4, test6, test8]
[foo.container-0-C-1] ListenerContainerIdleEvent [idleTime=5.007s, listenerId=foo.container-0, container=KafkaMessageListenerContainer [id=foo.container-0, clientIndex=-0, topicPartitions=[foo-0]], paused=false, topicPartitions=[foo-0]]
[SimpleAsyncTaskExecutor-1] stopping container: foo.container
[foo.container-0-C-1] [Consumer clientId=consumer-group.for.foo-2, groupId=group.for.foo] Unsubscribed all topics or patterns and assigned partitions
[foo.container-0-C-1] Metrics scheduler closed
[foo.container-0-C-1] Closing reporter org.apache.kafka.common.metrics.JmxReporter
[foo.container-0-C-1] Metrics reporters closed
[foo.container-0-C-1] App info kafka.consumer for consumer-group.for.foo-2 unregistered
[foo.container-0-C-1] group.for.foo: Consumer stopped

Java - Flink sending empty object on kafka sink

On my flink script I have a stream that I'm getting from one kafka topic, manipulate it and sending it back to kafka using the sink.
public static void main(String[] args) throws Exception {
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
Properties p = new Properties();
p.setProperty("bootstrap.servers", servers_ip_list);
p.setProperty("gropu.id", "Flink");
FlinkKafkaConsumer<Event_N> kafkaData_N =
new FlinkKafkaConsumer("CorID_0", new Ev_Des_Sch_N(), p);
WatermarkStrategy<Event_N> wmStrategy =
WatermarkStrategy
.<Event_N>forMonotonousTimestamps()
.withIdleness(Duration.ofMinutes(1))
.withTimestampAssigner((Event, timestamp) -> {
return Event.get_Time();
});
DataStream<Event_N> stream_N = env.addSource(
kafkaData_N.assignTimestampsAndWatermarks(wmStrategy));
The part above is working fine no problems at all, the part below instead is where I'm getting the issue.
String ProducerTopic = "CorID_0_f1";
DataStream<Stream_Blocker_Pojo.block> box_stream_p= stream_N
.keyBy((Event_N CorrID) -> CorrID.get_CorrID())
.map(new Stream_Blocker_Pojo());
FlinkKafkaProducer<Stream_Blocker_Pojo.block> myProducer = new FlinkKafkaProducer<>(
ProducerTopic,
new ObjSerializationSchema(ProducerTopic),
p,
FlinkKafkaProducer.Semantic.EXACTLY_ONCE); // fault-tolerance
box_stream_p.addSink(myProducer);
No errors everything works fine, this is the Stream_Blocker_Pojo where I'm mapping a stream manipulating it and sending out a new one.(I have simplify my code, just keeping 4 variables and removing all the math and data processing).
public class Stream_Blocker_Pojo extends RichMapFunction<Event_N, Stream_Blocker_Pojo.block>
{
public class block {
public Double block_id;
public Double block_var2 ;
public Double block_var3;
public Double block_var4;}
private transient ValueState<block> state_a;
#Override
public void open(Configuration parameters) throws Exception {
state_a = getRuntimeContext().getState(new ValueStateDescriptor<>("BoxState_a", block.class));
}
public block map(Event_N input) throws Exception {
p1.Stream_Blocker_Pojo.block current_a = state_a.value();
if (current_a == null) {
current_a = new p1.Stream_Blocker_Pojo.block();
current_a.block_id = 0.0;
current_a.block_var2 = 0.0;
current_a.block_var3 = 0.0;
current_a.block_var4 = 0.0;}
current_a.block_id = input.f_num_id;
current_a.block_var2 = input.f_num_2;
current_a.block_var3 = input.f_num_3;
current_a.tblock_var4 = input.f_num_4;
state_a.update(current_a);
return new block();
};
}
This is the implementation of the Kafka Serialization schema.
public class ObjSerializationSchema implements KafkaSerializationSchema<Stream_Blocker_Pojo.block>{
private String topic;
private ObjectMapper mapper;
public ObjSerializationSchema(String topic) {
super();
this.topic = topic;
}
#Override
public ProducerRecord<byte[], byte[]> serialize(Stream_Blocker_Pojo.block obj, Long timestamp) {
byte[] b = null;
if (mapper == null) {
mapper = new ObjectMapper();
}
try {
b= mapper.writeValueAsBytes(obj);
} catch (JsonProcessingException e) {
}
return new ProducerRecord<byte[], byte[]>(topic, b);
}
}
When I open the messages that i sent from my Flink script using kafka, I find that all the variables are "null"
CorrID b'{"block_id":null,"block_var1":null,"block_var2":null,"block_var3":null,"block_var4":null}
It looks like I'm sending out an empty obj with no values. But I'm struggling to understand what I'm doing wrong. I think that the problem could be into my implementation of the Stream_Blocker_Pojo or maybe into the ObjSerializationSchema, Any help would be really appreciated. Thanks
There are two probable issues here:
Are You sure the variable You are passing of type block doesn't have null fields? You may want to debug that part to be sure.
The reason may also be in ObjectMapper, You should have getters and setters available for Your block otherwise Jackson may not be able to access them.

How to deserialize avro data using Apache Beam (KafkaIO)

I've only seen one thread containing information about the topic I've mentioned which is :
How to Deserialising Kafka AVRO messages using Apache Beam
However, after trying a few variations of kafkaserializers I still cannot deserialize kafka messages. Here's my code:
public class Readkafka {
private static final Logger LOG = LoggerFactory.getLogger(Readkafka.class);
public static void main(String[] args) throws IOException {
// Create the Pipeline object with the options we defined above.
Pipeline p = Pipeline.create(
PipelineOptionsFactory.fromArgs(args).withValidation().create());
PTransform<PBegin, PCollection<KV<action_states_pkey, String>>> kafka =
KafkaIO.<action_states_pkey, String>read()
.withBootstrapServers("mybootstrapserver")
.withTopic("action_States")
.withKeyDeserializer(MyClassKafkaAvroDeserializer.class)
.withValueDeserializer(StringDeserializer.class)
.updateConsumerProperties(ImmutableMap.of("schema.registry.url", (Object)"schemaregistryurl"))
.withMaxNumRecords(5)
.withoutMetadata();
p.apply(kafka)
.apply(Keys.<action_states_pkey>create())
}
where MyClassKafkaAvroDeserilizer is
public class MyClassKafkaAvroDeserializer extends
AbstractKafkaAvroDeserializer implements Deserializer<action_states_pkey> {
#Override
public void configure(Map<String, ?> configs, boolean isKey) {
configure(new KafkaAvroDeserializerConfig(configs));
}
#Override
public action_states_pkey deserialize(String s, byte[] bytes) {
return (action_states_pkey) this.deserialize(bytes);
}
#Override
public void close() {} }
and the class action_states_pkey is code generated from avro tools using
java -jar pathtoavrotools/avro-tools-1.8.1.jar compile schema pathtoschema/action_states_pkey.avsc destination path
where the action_states_pkey.avsc is literally
{"type":"record","name":"action_states_pkey","namespace":"namespace","fields":[{"name":"ad_id","type":["null","int"]},{"name":"action_id","type":["null","int"]},{"name":"state_id","type":["null","int"]}]}
With this code I'm getting the error :
Caused by: java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot be cast to my.mudah.beam.test.action_states_pkey
at my.mudah.beam.test.MyClassKafkaAvroDeserializer.deserialize(MyClassKafkaAvroDeserializer.java:20)
at my.mudah.beam.test.MyClassKafkaAvroDeserializer.deserialize(MyClassKafkaAvroDeserializer.java:1)
at org.apache.beam.sdk.io.kafka.KafkaUnboundedReader.advance(KafkaUnboundedReader.java:221)
at org.apache.beam.sdk.io.BoundedReadFromUnboundedSource$UnboundedToBoundedSourceAdapter$Reader.advanceWithBackoff(BoundedReadFromUnboundedSource.java:279)
at org.apache.beam.sdk.io.BoundedReadFromUnboundedSource$UnboundedToBoundedSourceAdapter$Reader.start(BoundedReadFromUnboundedSource.java:256)
at com.google.cloud.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:592)
... 14 more
It seems there's an error in trying to map the Avro Data to my custom class ?
Alternatively, I've tried the following code :
PTransform<PBegin, PCollection<KV<action_states_pkey, String>>> kafka =
KafkaIO.<action_states_pkey, String>read()
.withBootstrapServers("bootstrapserver")
.withTopic("action_states")
.withKeyDeserializerAndCoder((Class)KafkaAvroDeserializer.class, AvroCoder.of(action_states_pkey.class))
.withValueDeserializer(StringDeserializer.class)
.updateConsumerProperties(ImmutableMap.of("schema.registry.url", (Object)"schemaregistry"))
.withMaxNumRecords(5)
.withoutMetadata();
p.apply(kafka);
.apply(Keys.<action_states_pkey>create())
// .apply("ExtractWords", ParDo.of(new DoFn<action_states_pkey, String>() {
// #ProcessElement
// public void processElement(ProcessContext c) {
// action_states_pkey key = c.element();
// c.output(key.getAdId().toString());
// }
// }));
which does not give me any error until i try to print out the data. I have to verify that I'm succesfully reading the data one way or another so my intent here is to log the data in the console. If I uncomment the commented section i get the same error once again:
SEVERE: 2019-09-13T07:53:56.168Z: java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot be cast to my.mudah.beam.test.action_states_pkey
at my.mudah.beam.test.Readkafka$1.processElement(Readkafka.java:151)
Another thing to note is that if I specify :
.updateConsumerProperties(ImmutableMap.of("specific.avro.reader", (Object)"true"))
always gives me an error of
Caused by: org.apache.kafka.common.errors.SerializationException: Error deserializing Avro message for id 443
Caused by: org.apache.kafka.common.errors.SerializationException: Could not find class NAMESPACE.action_states_pkey specified in writer's schema whilst finding reader's schema for a SpecificRecord.
It seems there's something wrong with my approach?
If anyone has any experience reading AVRO data from Kafka Streams using Apache Beam, please do help me out. I greatly appreciate it.
Here's a snapshot of my package with the schema and class in it as well:
package/working path details
Thanks.
public class MyClassKafkaAvroDeserializer extends
AbstractKafkaAvroDeserializer
Your class is extending the AbstractKafkaAvroDeserializer which returns GenericRecord.
You need to convert the GenericRecord to your custom object.
OR
Use SpecificRecord for this as stated in one of the following answers:
/**
* Extends deserializer to support ReflectData.
*
* #param <V>
* value type
*/
public abstract class ReflectKafkaAvroDeserializer<V> extends KafkaAvroDeserializer {
private Schema readerSchema;
private DecoderFactory decoderFactory = DecoderFactory.get();
protected ReflectKafkaAvroDeserializer(Class<V> type) {
readerSchema = ReflectData.get().getSchema(type);
}
#Override
protected Object deserialize(
boolean includeSchemaAndVersion,
String topic,
Boolean isKey,
byte[] payload,
Schema readerSchemaIgnored) throws SerializationException {
if (payload == null) {
return null;
}
int schemaId = -1;
try {
ByteBuffer buffer = ByteBuffer.wrap(payload);
if (buffer.get() != MAGIC_BYTE) {
throw new SerializationException("Unknown magic byte!");
}
schemaId = buffer.getInt();
Schema writerSchema = schemaRegistry.getByID(schemaId);
int start = buffer.position() + buffer.arrayOffset();
int length = buffer.limit() - 1 - idSize;
DatumReader<Object> reader = new ReflectDatumReader(writerSchema, readerSchema);
BinaryDecoder decoder = decoderFactory.binaryDecoder(buffer.array(), start, length, null);
return reader.read(null, decoder);
} catch (IOException e) {
throw new SerializationException("Error deserializing Avro message for id " + schemaId, e);
} catch (RestClientException e) {
throw new SerializationException("Error retrieving Avro schema for id " + schemaId, e);
}
}
}
The above is copied from https://stackoverflow.com/a/39617120/2534090
https://stackoverflow.com/a/42514352/2534090

Aggregate Java objects in a list with Kafka Streams DSL windows

I have the most straightforward of use cases for Kafka Streams DSL: read in CSV sensordata, group by timestamp and output. Following code does not compile:
public static void main(String[] args) {
StreamsConfig streamingConfig = new StreamsConfig(getProperties());
Serde<String> stringSerde = Serdes.String();
CSVDeserializer<SensorData> sensorDataDeserializer = new CSVDeserializer<>(SensorData.class);
JsonSerializer<SensorData> sensorDataSerializer = new JsonSerializer<>();
Serde sensorDataSerde = Serdes.serdeFrom(sensorDataSerializer, sensorDataDeserializer);
JsonDeserializer<SensorData> sensorDataJsonDeserializer = new JsonDeserializer<>(SensorData.class);
Serde sensorDataJSONSerde = Serdes.serdeFrom(sensorDataSerializer, sensorDataJsonDeserializer);
StringSerializer stringSerializer = new StringSerializer();
StringDeserializer stringDeserializer = new StringDeserializer();
WindowedSerializer<String> windowedSerializer = new WindowedSerializer<>(stringSerializer);
WindowedDeserializer<String> windowedDeserializer = new WindowedDeserializer<>(stringDeserializer);
Serde<Windowed<String>> windowedSerde = Serdes.serdeFrom(windowedSerializer, windowedDeserializer);
JsonSerializer<SensorDataAccumulator> accSerializer = new JsonSerializer<>();
JsonDeserializer accDeserializer = new JsonDeserializer<>(SensorDataAccumulator.class);
Serde<SensorDataAccumulator> accSerde = Serdes.serdeFrom(accSerializer, accDeserializer);
KStreamBuilder kStreamBuilder = new KStreamBuilder();
KStream<String,SensorData> initialStream = kStreamBuilder.stream(stringSerde,sensorDataSerde,"e40_orig");
final KStream<String, SensorData> sensorDataKStream = initialStream
.filter((k, v) -> (v != null))
.map((k, v) -> new KeyValue<>(v.getMeasurementDateTime().toString(), v));
sensorDataKStream
.filter((k, v) -> (v != null))
.groupBy((k,v) -> k, stringSerde, sensorDataJSONSerde)
.aggregate(SensorDataAccumulator::new,
==> error (k, v, list) -> list.add(v), //CHANGED THIS -->((SensorDataAccumulator)list).add((SensorData)v),
TimeWindows.of(10000),
accSerde, "acc")
.to(windowedSerde, accSerde, "out");
KafkaStreams kafkaStreams = new KafkaStreams(kStreamBuilder,streamingConfig);
kafkaStreams.start();
}
due to
Error:(90, 45) java: cannot find symbol symbol: method
add(java.lang.Object) location: variable list of type
java.lang.Object
Weird.
public class SensorDataAccumulator {
ArrayList list = new ArrayList<SensorData>();
public SensorDataAccumulator add(SensorData s) {
list.add(s);
return this;
}
Casting as commented leads to following runtime exception (right before outputting the windowed accumulation).
[2017-01-02 13:00:45,614] INFO task [1_0] Initializing processor nodes of the topology (org.apache.kafka.streams.processor.internals.StreamTask:123)
[2017-01-02 13:01:04,173] WARN Error while fetching metadata with correlation id 779 : {out=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient:600)
[2017-01-02 13:01:04,662] INFO stream-thread [StreamThread-1] Shutting down (org.apache.kafka.streams.processor.internals.StreamThread:268)
[2017-01-02 13:01:04,663] INFO stream-thread [StreamThread-1] Committing consumer offsets of task 0_0 (org.apache.kafka.streams.processor.internals.StreamThread:358)
[2017-01-02 13:01:04,666] INFO stream-thread [StreamThread-1] Committing consumer offsets of task 1_0 (org.apache.kafka.streams.processor.internals.StreamThread:358)
[2017-01-02 13:01:04,668] INFO stream-thread [StreamThread-1] Closing a task 0_0 (org.apache.kafka.streams.processor.internals.StreamThread:751)
[2017-01-02 13:01:04,668] INFO stream-thread [StreamThread-1] Closing a task 1_0 (org.apache.kafka.streams.processor.internals.StreamThread:751)
[2017-01-02 13:01:04,668] INFO stream-thread [StreamThread-1] Flushing state stores of task 0_0 (org.apache.kafka.streams.processor.internals.StreamThread:368)
[2017-01-02 13:01:04,669] INFO stream-thread [StreamThread-1] Flushing state stores of task 1_0 (org.apache.kafka.streams.processor.internals.StreamThread:368)
Exception in thread "StreamThread-1" java.lang.NoSuchMethodError: org.rocksdb.RocksIterator.close()V
at org.apache.kafka.streams.state.internals.RocksDBStore$RocksDbIterator.close(RocksDBStore.java:468)
at org.apache.kafka.streams.state.internals.RocksDBStore.closeOpenIterators(RocksDBStore.java:411)
at org.apache.kafka.streams.state.internals.RocksDBStore.close(RocksDBStore.java:397)
at org.apache.kafka.streams.state.internals.RocksDBWindowStore.close(RocksDBWindowStore.java:276)
at org.apache.kafka.streams.state.internals.MeteredWindowStore.close(MeteredWindowStore.java:109)
at org.apache.kafka.streams.state.internals.CachingWindowStore.close(CachingWindowStore.java:125)
at org.apache.kafka.streams.processor.internals.ProcessorStateManager.close(ProcessorStateManager.java:349)
at org.apache.kafka.streams.processor.internals.AbstractTask.closeStateManager(AbstractTask.java:120)
at org.apache.kafka.streams.processor.internals.StreamThread$2.apply(StreamThread.java:348)
at org.apache.kafka.streams.processor.internals.StreamThread.performOnAllTasks(StreamThread.java:328)
at org.apache.kafka.streams.processor.internals.StreamThread.closeAllStateManagers(StreamThread.java:344)
at org.apache.kafka.streams.processor.internals.StreamThread.shutdownTasksAndState(StreamThread.java:305)
at org.apache.kafka.streams.processor.internals.StreamThread.shutdown(StreamThread.java:269)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:252)
[2017-01-02 13:01:05,316] INFO stream-thread [StreamThread-1] Closing the state manager of task 0_0 (org.apache.kafka.streams.processor.internals.StreamThread:347)
[2017-01-02 13:01:05,316] INFO stream-thread [StreamThread-1] Closing the state manager of task 1_0 (org.apache.kafka.streams.processor.internals.StreamThread:347)
Debugging the add method of SensorDataAccumulator should give a clue:
So, if I understand correctly, I'm keeping a ArrayList list = new ArrayList<SensorData>(); but actually, somewhere in the process its members are changed to LinkedTreeMap. The typechecker lost me here...
Ok, the LinkedTreeMap is the underlying datastructure GSON uses for my JsonDeserializer and JsonSerializer classes. So I'll add these for completeness below.
Currently I'm not sure what I am doing wrong and where to fix it. Should I use different serializers, different data structures? Different language ;) ?
Any input is welcome.
public class JsonSerializer<T> implements Serializer<T> {
private Gson gson = new Gson();
#Override
public void configure(Map<String, ?> map, boolean b) {
}
#Override
public byte[] serialize(String topic, T t) {
return gson.toJson(t).getBytes(Charset.forName("UTF-8"));
}
#Override
public void close() {
}
}
public class JsonDeserializer<T> implements Deserializer<T> {
private Gson gson = new Gson();
private Class<T> deserializedClass;
public JsonDeserializer(Class<T> deserializedClass) {
this.deserializedClass = deserializedClass;
}
public JsonDeserializer() {
}
#Override
public void configure(Map<String, ?> map, boolean b) {
if(deserializedClass == null) {
deserializedClass = (Class<T>) map.get("serializedClass");
}
}
#Override
public T deserialize(String s, byte[] bytes) {
if(bytes == null){
return null;
}
return gson.fromJson(new String(bytes),deserializedClass);
}
#Override
public void close() {
}
}

Categories