How to deserialize avro data using Apache Beam (KafkaIO) - java

I've only seen one thread containing information about the topic I've mentioned which is :
How to Deserialising Kafka AVRO messages using Apache Beam
However, after trying a few variations of kafkaserializers I still cannot deserialize kafka messages. Here's my code:
public class Readkafka {
private static final Logger LOG = LoggerFactory.getLogger(Readkafka.class);
public static void main(String[] args) throws IOException {
// Create the Pipeline object with the options we defined above.
Pipeline p = Pipeline.create(
PipelineOptionsFactory.fromArgs(args).withValidation().create());
PTransform<PBegin, PCollection<KV<action_states_pkey, String>>> kafka =
KafkaIO.<action_states_pkey, String>read()
.withBootstrapServers("mybootstrapserver")
.withTopic("action_States")
.withKeyDeserializer(MyClassKafkaAvroDeserializer.class)
.withValueDeserializer(StringDeserializer.class)
.updateConsumerProperties(ImmutableMap.of("schema.registry.url", (Object)"schemaregistryurl"))
.withMaxNumRecords(5)
.withoutMetadata();
p.apply(kafka)
.apply(Keys.<action_states_pkey>create())
}
where MyClassKafkaAvroDeserilizer is
public class MyClassKafkaAvroDeserializer extends
AbstractKafkaAvroDeserializer implements Deserializer<action_states_pkey> {
#Override
public void configure(Map<String, ?> configs, boolean isKey) {
configure(new KafkaAvroDeserializerConfig(configs));
}
#Override
public action_states_pkey deserialize(String s, byte[] bytes) {
return (action_states_pkey) this.deserialize(bytes);
}
#Override
public void close() {} }
and the class action_states_pkey is code generated from avro tools using
java -jar pathtoavrotools/avro-tools-1.8.1.jar compile schema pathtoschema/action_states_pkey.avsc destination path
where the action_states_pkey.avsc is literally
{"type":"record","name":"action_states_pkey","namespace":"namespace","fields":[{"name":"ad_id","type":["null","int"]},{"name":"action_id","type":["null","int"]},{"name":"state_id","type":["null","int"]}]}
With this code I'm getting the error :
Caused by: java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot be cast to my.mudah.beam.test.action_states_pkey
at my.mudah.beam.test.MyClassKafkaAvroDeserializer.deserialize(MyClassKafkaAvroDeserializer.java:20)
at my.mudah.beam.test.MyClassKafkaAvroDeserializer.deserialize(MyClassKafkaAvroDeserializer.java:1)
at org.apache.beam.sdk.io.kafka.KafkaUnboundedReader.advance(KafkaUnboundedReader.java:221)
at org.apache.beam.sdk.io.BoundedReadFromUnboundedSource$UnboundedToBoundedSourceAdapter$Reader.advanceWithBackoff(BoundedReadFromUnboundedSource.java:279)
at org.apache.beam.sdk.io.BoundedReadFromUnboundedSource$UnboundedToBoundedSourceAdapter$Reader.start(BoundedReadFromUnboundedSource.java:256)
at com.google.cloud.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:592)
... 14 more
It seems there's an error in trying to map the Avro Data to my custom class ?
Alternatively, I've tried the following code :
PTransform<PBegin, PCollection<KV<action_states_pkey, String>>> kafka =
KafkaIO.<action_states_pkey, String>read()
.withBootstrapServers("bootstrapserver")
.withTopic("action_states")
.withKeyDeserializerAndCoder((Class)KafkaAvroDeserializer.class, AvroCoder.of(action_states_pkey.class))
.withValueDeserializer(StringDeserializer.class)
.updateConsumerProperties(ImmutableMap.of("schema.registry.url", (Object)"schemaregistry"))
.withMaxNumRecords(5)
.withoutMetadata();
p.apply(kafka);
.apply(Keys.<action_states_pkey>create())
// .apply("ExtractWords", ParDo.of(new DoFn<action_states_pkey, String>() {
// #ProcessElement
// public void processElement(ProcessContext c) {
// action_states_pkey key = c.element();
// c.output(key.getAdId().toString());
// }
// }));
which does not give me any error until i try to print out the data. I have to verify that I'm succesfully reading the data one way or another so my intent here is to log the data in the console. If I uncomment the commented section i get the same error once again:
SEVERE: 2019-09-13T07:53:56.168Z: java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot be cast to my.mudah.beam.test.action_states_pkey
at my.mudah.beam.test.Readkafka$1.processElement(Readkafka.java:151)
Another thing to note is that if I specify :
.updateConsumerProperties(ImmutableMap.of("specific.avro.reader", (Object)"true"))
always gives me an error of
Caused by: org.apache.kafka.common.errors.SerializationException: Error deserializing Avro message for id 443
Caused by: org.apache.kafka.common.errors.SerializationException: Could not find class NAMESPACE.action_states_pkey specified in writer's schema whilst finding reader's schema for a SpecificRecord.
It seems there's something wrong with my approach?
If anyone has any experience reading AVRO data from Kafka Streams using Apache Beam, please do help me out. I greatly appreciate it.
Here's a snapshot of my package with the schema and class in it as well:
package/working path details
Thanks.

public class MyClassKafkaAvroDeserializer extends
AbstractKafkaAvroDeserializer
Your class is extending the AbstractKafkaAvroDeserializer which returns GenericRecord.
You need to convert the GenericRecord to your custom object.
OR
Use SpecificRecord for this as stated in one of the following answers:
/**
* Extends deserializer to support ReflectData.
*
* #param <V>
* value type
*/
public abstract class ReflectKafkaAvroDeserializer<V> extends KafkaAvroDeserializer {
private Schema readerSchema;
private DecoderFactory decoderFactory = DecoderFactory.get();
protected ReflectKafkaAvroDeserializer(Class<V> type) {
readerSchema = ReflectData.get().getSchema(type);
}
#Override
protected Object deserialize(
boolean includeSchemaAndVersion,
String topic,
Boolean isKey,
byte[] payload,
Schema readerSchemaIgnored) throws SerializationException {
if (payload == null) {
return null;
}
int schemaId = -1;
try {
ByteBuffer buffer = ByteBuffer.wrap(payload);
if (buffer.get() != MAGIC_BYTE) {
throw new SerializationException("Unknown magic byte!");
}
schemaId = buffer.getInt();
Schema writerSchema = schemaRegistry.getByID(schemaId);
int start = buffer.position() + buffer.arrayOffset();
int length = buffer.limit() - 1 - idSize;
DatumReader<Object> reader = new ReflectDatumReader(writerSchema, readerSchema);
BinaryDecoder decoder = decoderFactory.binaryDecoder(buffer.array(), start, length, null);
return reader.read(null, decoder);
} catch (IOException e) {
throw new SerializationException("Error deserializing Avro message for id " + schemaId, e);
} catch (RestClientException e) {
throw new SerializationException("Error retrieving Avro schema for id " + schemaId, e);
}
}
}
The above is copied from https://stackoverflow.com/a/39617120/2534090
https://stackoverflow.com/a/42514352/2534090

Related

Got a java.lang.IllegalArgumentException when sending a Java object by the dynamic TCP/IP integration flow?

Gary Russell helped me some time ago with the following 'DynamicTcpServer' flow (see Building a TCP/IP server with SI's dynamic flow registration) having now a message service injected which gets the message to send as soon as a client connects:
public class DynamicTcpServer implements TcpServer {
#Autowired
private IntegrationFlowContext flowContext;
#Autowired
private ApplicationContext appContext;
private final Map<String, IntegrationFlowRegistration> registrations = new HashMap<>();
private final Map<String, String> clients = new ConcurrentHashMap<>();
private final Map<String, TcpServerSpec> sockets;
private final MessageService messenger;
#Autowired
public DynamicTcpServer(MessageService messenger, Map<String, TcpServerSpec> sockets) {
this.messenger = messenger;
this.sockets = sockets;
}
#Override
public void start(String context) {
start(context, sockets.get(context).getPort());
}
#Override
public void start(String context, int port) {
if (this.registrations.containsKey(context)) {
/* already running */
}
else {
TcpServerConnectionFactorySpec server = Tcp.netServer(port).id(context).serializer(TcpCodecs.lf());
server.get().registerListener(msg -> false); // dummy listener so the accept thread doesn't exit
IntegrationFlow flow = f -> f.handle(Tcp.outboundAdapter(server));
this.registrations.put(context, flowContext.registration(flow).register());
}
}
#Override
public Set<String> running() {
return registrations.keySet();
}
#Override
public void stop(String context) {
IntegrationFlowRegistration registration = this.registrations.remove(context);
if (registration != null) {
registration.destroy();
}
}
#EventListener
public void connect(TcpConnectionOpenEvent event) {
String connectionId = event.getConnectionId();
this.clients.put(connectionId, event.getConnectionFactoryName());
}
#EventListener
public void closed(TcpConnectionCloseEvent event) {
this.clients.remove(event.getConnectionId());
}
#EventListener
public void listening(TcpConnectionServerListeningEvent event) {
}
#Scheduled(
fixedDelayString = "${com.harry.potter.scheduler.fixed-delay}",
initialDelayString = "${com.harry.potter.scheduler.initial-delay}"
)
public void sender() {
this.clients.forEach((connectId, context) -> {
IntegrationFlowRegistration register = registrations.get(context);
if (register != null) {
try {
while (true) {
List<ServerMessage> msgs = messenger.getMessagesToSend(sockets.get(context));
msgs.stream().forEach(msg ->
register.getMessagingTemplate().send(
MessageBuilder.withPayload(msg)
.setHeader(IpHeaders.CONNECTION_ID, connectId).build()));
}
}
catch (NoMessageToSendException nm) {
appContext.getBean(context, TcpNetServerConnectionFactory.class)
.closeConnection(connectId);
}
}
});
}
}
The message service returns a Java object 'com.harry.potter.entity.ServerMessage' to be sent.
So I assume I have to add some other kind of converter at '.serializer(TcpCodecs.lf())' because I got an exception saying:
2022-04-17 04:00:45.729 DEBUG [] --- [pool-283-thread-1] c.l.c.c.cas.service.DynamicTcpServer : sender: send 1 messages to potter1
2022-04-17 04:00:45.738 DEBUG [] --- [pool-283-thread-1] c.l.c.c.c.service.DynamicTcpServer : closed event=TcpConnectionCloseEvent [source=TcpNetConnection:harry.potter.de:56746:17584:76adefe0-0881-4e4b-be2b-0ced47f950ae], [factory=potter1, connectionId=harry.potter.de:56746:17584:76adefe0-0881-4e4b-be2b-0ced47f950ae] **CLOSED**
2022-04-17 04:00:45.740 ERROR [] --- [pool-283-thread-1] o.s.i.ip.tcp.TcpSendingMessageHandler : Error sending message
org.springframework.messaging.MessagingException: Send Failed; nested exception is java.lang.IllegalArgumentException: When using a byte array serializer, the socket mapper expects either a byte array or String payload, but received: class com.harry.potter.entity.ServerMessage
at org.springframework.integration.ip.tcp.connection.TcpNetConnection.send(TcpNetConnection.java:118)
at org.springframework.integration.ip.tcp.TcpSendingMessageHandler.handleMessageAsServer(TcpSendingMessageHandler.java:119)
at org.springframework.integration.ip.tcp.TcpSendingMessageHandler.handleMessageInternal(TcpSendingMessageHandler.java:103)
at org.springframework.integration.handler.AbstractMessageHandler.handleMessage(AbstractMessageHandler.java:62)
at org.springframework.integration.dispatcher.AbstractDispatcher.tryOptimizedDispatch(AbstractDispatcher.java:115)
at org.springframework.integration.dispatcher.UnicastingDispatcher.doDispatch(UnicastingDispatcher.java:133)
at org.springframework.integration.dispatcher.UnicastingDispatcher.dispatch(UnicastingDispatcher.java:106)
at org.springframework.integration.channel.AbstractSubscribableChannel.doSend(AbstractSubscribableChannel.java:72)
at org.springframework.integration.channel.AbstractMessageChannel.send(AbstractMessageChannel.java:570)
at org.springframework.integration.channel.AbstractMessageChannel.send(AbstractMessageChannel.java:520)
at org.springframework.messaging.core.GenericMessagingTemplate.doSend(GenericMessagingTemplate.java:187)
at org.springframework.messaging.core.GenericMessagingTemplate.doSend(GenericMessagingTemplate.java:166)
at org.springframework.messaging.core.GenericMessagingTemplate.doSend(GenericMessagingTemplate.java:47)
at org.springframework.messaging.core.AbstractMessageSendingTemplate.send(AbstractMessageSendingTemplate.java:109)
at org.springframework.messaging.core.AbstractMessageSendingTemplate.send(AbstractMessageSendingTemplate.java:99)
at com.harry.potter.service.DynamicTcpServer.lambda$sender$2(DynamicTcpServer.java:125)
at com.harry.potter.service.DynamicTcpServer$$Lambda$40600/0x000000006f511b08.accept(Unknown Source)
at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625)
at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:762)
at com.harry.potter.service.DynamicTcpServer.lambda$sender$3(DynamicTcpServer.java:124)
at com.harry.potter.service.DynamicTcpServer$$Lambda$40552/0x000000003344f3b0.accept(Unknown Source)
at java.base/java.util.concurrent.ConcurrentHashMap.forEach(ConcurrentHashMap.java:1603)
at com.harry.potter.service.DynamicTcpServer.sender(DynamicTcpServer.java:115)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
at org.springframework.scheduling.support.ScheduledMethodRunnable.run(ScheduledMethodRunnable.java:84)
at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:884)
Caused by: java.lang.IllegalArgumentException: When using a byte array serializer, the socket mapper expects either a byte array or String payload, but received: class com.harry.potter.entity.ServerMessage
at org.springframework.integration.ip.tcp.connection.TcpMessageMapper.getPayloadAsBytes(TcpMessageMapper.java:277)
at org.springframework.integration.ip.tcp.connection.TcpMessageMapper.fromMessage(TcpMessageMapper.java:252)
at org.springframework.integration.ip.tcp.connection.TcpNetConnection.send(TcpNetConnection.java:111)
... 34 common frames omitted
Which converter (serializer) do I have to use and how to plug it in my DynamicTcpServer exactly?
EDIT 1
The message service messenger returns a Java object 'com.harry.potter.entity.ServerMessage' to be sent. The ServerMessage contains an int field holding the message length and a String field holding the message text:
public class ServerMessage implements Serializable {
private static final long serialVersionUID = -1L;
private int len;
private String message;
/* getters & setters */
}
I am trying to migrate from a C/C++ function which writes the C Struct
struct C_MSG
{
int len; /* Length field */
char text[MAX_MSG_LEN]; /* Data field */
} c_msg;
to a consumer using the C Socket library send function writing a given number of bytes (length of text + 4) from the given memory address to the given TCP/IP socket.
I am looking for a Transformer to prepare the same binary content for the message consumer. Otherwise the consumer will not be able to cope with the message.
Following the comments and looking at the GenericTransformer<S, T> the transformation could be done in a single Lambda expression. The source of the transformation would be an object of the ServerMessage? The result should be an array of Bytes using Spring's utility:
.transform(s -> SerializationUtils.serialize(s))
Will the Lambda expression be that one? Perhaps do I need a custom Transformer - to have more control over the serializing process in case my consumer expects Intel resp. Motorola byte order - with implementing a specific interface? Which one? Perhaps there is a much easier solution?
You need to think about .transform() before that .handle(Tcp.outboundAdapter(server)); to convert your ServerMessage to byte[] or String. That's what is expected in the TcpMessageMapper by default.
Of course I could recommend you to look into the mapper(TcpMessageMapper mapper) option of the Tcp.netServer() and its bytesMessageMapper property, but the outcome would be just the same.

Java - Flink sending empty object on kafka sink

On my flink script I have a stream that I'm getting from one kafka topic, manipulate it and sending it back to kafka using the sink.
public static void main(String[] args) throws Exception {
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
Properties p = new Properties();
p.setProperty("bootstrap.servers", servers_ip_list);
p.setProperty("gropu.id", "Flink");
FlinkKafkaConsumer<Event_N> kafkaData_N =
new FlinkKafkaConsumer("CorID_0", new Ev_Des_Sch_N(), p);
WatermarkStrategy<Event_N> wmStrategy =
WatermarkStrategy
.<Event_N>forMonotonousTimestamps()
.withIdleness(Duration.ofMinutes(1))
.withTimestampAssigner((Event, timestamp) -> {
return Event.get_Time();
});
DataStream<Event_N> stream_N = env.addSource(
kafkaData_N.assignTimestampsAndWatermarks(wmStrategy));
The part above is working fine no problems at all, the part below instead is where I'm getting the issue.
String ProducerTopic = "CorID_0_f1";
DataStream<Stream_Blocker_Pojo.block> box_stream_p= stream_N
.keyBy((Event_N CorrID) -> CorrID.get_CorrID())
.map(new Stream_Blocker_Pojo());
FlinkKafkaProducer<Stream_Blocker_Pojo.block> myProducer = new FlinkKafkaProducer<>(
ProducerTopic,
new ObjSerializationSchema(ProducerTopic),
p,
FlinkKafkaProducer.Semantic.EXACTLY_ONCE); // fault-tolerance
box_stream_p.addSink(myProducer);
No errors everything works fine, this is the Stream_Blocker_Pojo where I'm mapping a stream manipulating it and sending out a new one.(I have simplify my code, just keeping 4 variables and removing all the math and data processing).
public class Stream_Blocker_Pojo extends RichMapFunction<Event_N, Stream_Blocker_Pojo.block>
{
public class block {
public Double block_id;
public Double block_var2 ;
public Double block_var3;
public Double block_var4;}
private transient ValueState<block> state_a;
#Override
public void open(Configuration parameters) throws Exception {
state_a = getRuntimeContext().getState(new ValueStateDescriptor<>("BoxState_a", block.class));
}
public block map(Event_N input) throws Exception {
p1.Stream_Blocker_Pojo.block current_a = state_a.value();
if (current_a == null) {
current_a = new p1.Stream_Blocker_Pojo.block();
current_a.block_id = 0.0;
current_a.block_var2 = 0.0;
current_a.block_var3 = 0.0;
current_a.block_var4 = 0.0;}
current_a.block_id = input.f_num_id;
current_a.block_var2 = input.f_num_2;
current_a.block_var3 = input.f_num_3;
current_a.tblock_var4 = input.f_num_4;
state_a.update(current_a);
return new block();
};
}
This is the implementation of the Kafka Serialization schema.
public class ObjSerializationSchema implements KafkaSerializationSchema<Stream_Blocker_Pojo.block>{
private String topic;
private ObjectMapper mapper;
public ObjSerializationSchema(String topic) {
super();
this.topic = topic;
}
#Override
public ProducerRecord<byte[], byte[]> serialize(Stream_Blocker_Pojo.block obj, Long timestamp) {
byte[] b = null;
if (mapper == null) {
mapper = new ObjectMapper();
}
try {
b= mapper.writeValueAsBytes(obj);
} catch (JsonProcessingException e) {
}
return new ProducerRecord<byte[], byte[]>(topic, b);
}
}
When I open the messages that i sent from my Flink script using kafka, I find that all the variables are "null"
CorrID b'{"block_id":null,"block_var1":null,"block_var2":null,"block_var3":null,"block_var4":null}
It looks like I'm sending out an empty obj with no values. But I'm struggling to understand what I'm doing wrong. I think that the problem could be into my implementation of the Stream_Blocker_Pojo or maybe into the ObjSerializationSchema, Any help would be really appreciated. Thanks
There are two probable issues here:
Are You sure the variable You are passing of type block doesn't have null fields? You may want to debug that part to be sure.
The reason may also be in ObjectMapper, You should have getters and setters available for Your block otherwise Jackson may not be able to access them.

Can't seem to transform KStream<A,B> to KTable<X,Y>

This is my first attempt at trying to use a KTable. I have a Kafka Stream that contains Avro serialized objects of type A,B. And this works fine. I can write a Consumer that consumes just fine or a simple KStream that simply counts records.
The B object has a field containing a country code. I'd like to supply that code to a KTable so it can count the number of records that contain a particular country code. To do so I'm trying to convert the stream into a stream of X,Y (or really: country-code, count). Eventually I look at the contents of the table and extract an array of KV pairs.
The code I have (included) always errors out with the following (see the line with 'Caused by'):
2018-07-26 13:42:48.688 [com.findology.tools.controller.TestEventGeneratorController-16d7cd06-4742-402e-a679-898b9ef78c41-StreamThread-1; AssignedStreamsTasks] ERROR -- stream-thread [com.findology.tools.controller.TestEventGeneratorController-16d7c\
d06-4742-402e-a679-898b9ef78c41-StreamThread-1] Failed to process stream task 0_0 due to the following error:
org.apache.kafka.streams.errors.StreamsException: Exception caught in process. taskId=0_0, processor=KSTREAM-SOURCE-0000000000, topic=com.findology.model.traffic.CpaTrackingCallback, partition=0, offset=962649
at org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:240)
at org.apache.kafka.streams.processor.internals.AssignedStreamsTasks.process(AssignedStreamsTasks.java:94)
at org.apache.kafka.streams.processor.internals.TaskManager.process(TaskManager.java:411)
at org.apache.kafka.streams.processor.internals.StreamThread.processAndMaybeCommit(StreamThread.java:922)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:802)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:749)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:719)
Caused by: org.apache.kafka.streams.errors.StreamsException: A serializer (key: org.apache.kafka.common.serialization.ByteArraySerializer / value: org.apache.kafka.common.serialization.ByteArraySerializer) is not compatible to the actual key or value type (key type: java.lang.Integer / value type: java.lang.Integer). Change the default Serdes in StreamConfig or provide correct Serdes via method parameters.
at org.apache.kafka.streams.processor.internals.SinkNode.process(SinkNode.java:92)
at org.apache.kafka.streams.processor.internals.AbstractProcessorContext.forward(AbstractProcessorContext.java:174)
at org.apache.kafka.streams.kstream.internals.KStreamFilter$KStreamFilterProcessor.process(KStreamFilter.java:43)
at org.apache.kafka.streams.processor.internals.ProcessorNode$1.run(ProcessorNode.java:46)
at org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetricsImpl.java:211)
at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:124)
at org.apache.kafka.streams.processor.internals.AbstractProcessorContext.forward(AbstractProcessorContext.java:174)
at org.apache.kafka.streams.kstream.internals.KStreamTransform$KStreamTransformProcessor.process(KStreamTransform.java:59)
at org.apache.kafka.streams.processor.internals.ProcessorNode$1.run(ProcessorNode.java:46)
at org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetricsImpl.java:211)
at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:124)
at org.apache.kafka.streams.processor.internals.AbstractProcessorContext.forward(AbstractProcessorContext.java:174)
at org.apache.kafka.streams.processor.internals.SourceNode.process(SourceNode.java:80)
at org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:224)
... 6 more
Caused by: java.lang.ClassCastException: java.lang.Integer cannot be cast to [B
at org.apache.kafka.common.serialization.ByteArraySerializer.serialize(ByteArraySerializer.java:21)
at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.send(RecordCollectorImpl.java:146)
at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.send(RecordCollectorImpl.java:94)
at org.apache.kafka.streams.processor.internals.SinkNode.process(SinkNode.java:87)
... 19 more
And here is the code I'm using. I've omitted certain classes for brevity. Note that I'm not using the Confluent KafkaAvro classes.
private synchronized void createStreamProcessor2() {
if (streams == null) {
try {
Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, getClass().getName());
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
StreamsConfig config = new StreamsConfig(props);
StreamsBuilder builder = new StreamsBuilder();
Map<String, Object> serdeProps = new HashMap<>();
serdeProps.put("schema.registry.url", schemaRegistryURL);
AvroSerde<CpaTrackingCallback> cpaTrackingCallbackAvroSerde = new AvroSerde<>(schemaRegistryURL);
cpaTrackingCallbackAvroSerde.configure(serdeProps, false);
// This is the key to telling kafka the specific Serde instance to use
// to deserialize the Avro encoded value
KStream<Long, CpaTrackingCallback> stream = builder.stream(CpaTrackingCallback.class.getName(),
Consumed.with(Serdes.Long(), cpaTrackingCallbackAvroSerde));
// provide a way to convert CpsTrackicking... info into just country codes
// (Long, CpaTrackingCallback) -> (countryCode:Integer, placeHolder:Long)
TransformerSupplier<Long, CpaTrackingCallback, KeyValue<Integer, Long>> transformer = new TransformerSupplier<Long, CpaTrackingCallback, KeyValue<Integer, Long>>() {
#Override
public Transformer<Long, CpaTrackingCallback, KeyValue<Integer, Long>> get() {
return new Transformer<Long, CpaTrackingCallback, KeyValue<Integer, Long>>() {
#Override
public void init(ProcessorContext context) {
// Not doing Punctuate so no need to store context
}
#Override
public KeyValue<Integer, Long> transform(Long key, CpaTrackingCallback value) {
return new KeyValue(value.getCountryCode(), 1);
}
#Override
public KeyValue<Integer, Long> punctuate(long timestamp) {
return null;
}
#Override
public void close() {
}
};
}
};
KTable<Integer, Long> countryCounts = stream.transform(transformer).groupByKey() //
.count(Materialized.as("country-counts"));
streams = new KafkaStreams(builder.build(), config);
Runtime.getRuntime().addShutdownHook(new Thread(streams::close));
streams.cleanUp();
streams.start();
try {
countryCountsView = waitUntilStoreIsQueryable("country-counts", QueryableStoreTypes.keyValueStore(),
streams);
}
catch (InterruptedException e) {
log.warn("Interrupted while waiting for query store to become available", e);
}
}
catch (Exception e) {
log.error(e);
}
}
}
The bare groupByKey() method on KStream uses the default serializer/deserializer (which you haven't set). Use the method groupByKey(Serialized<K,V> serialized), as in:
.groupByKey(Serialized.with(Serdes.Integer(), Serdes.Long()))
Also note, what you do in your custom TransformerSupplier, you can do simply with a KStream.map call.

How to convert pubsub payload to LogEntry object in log export

I have enabled log exports to a pub sub topic. I am using dataflow to process these logs and store relevant columns in BigQuery. Can someone please help with the conversion of the pubsub message payload to a LogEntry object.
I have tried the following code:
#ProcessElement
public void processElement(ProcessContext c) throws Exception {
PubsubMessage pubsubMessage = c.element();
ObjectMapper mapper = new ObjectMapper();
byte[] payload = pubsubMessage.getPayload();
String s = new String(payload, "UTF8");
LogEntry logEntry = mapper.readValue(s, LogEntry.class);
}
But I got the following error:
com.fasterxml.jackson.databind.JsonMappingException: Can not find a (Map) Key deserializer for type [simple type, class com.google.protobuf.Descriptors$FieldDescriptor]
Edit:
I tried the following code:
try {
ByteArrayInputStream stream = new ByteArrayInputStream(Base64.decodeBase64(pubsubMessage.getPayload()));
LogEntry logEntry = LogEntry.parseDelimitedFrom(stream);
System.out.println("Log Entry = " + logEntry);
} catch (InvalidProtocolBufferException e) {
e.printStackTrace();
}
But I get the following error now:
com.google.protobuf.InvalidProtocolBufferException: Protocol message end-group tag did not match expected tag
The JSON format parser should be able to do this. Java's not my strength, but I think you're looking for something like:
#ProcessElement
public void processElement(ProcessContext c) throws Exception {
LogEntry.Builder entryBuilder = LogEntry.newBuilder();
JsonFormat.Parser.usingTypeRegistry(
JsonFormat.TypeRegistry.newBuilder()
.add(LogEntry.getDescriptor())
.build())
.ignoringUnknownFields()
.merge(c.element(), entryBuilder);
LogEntry entry = entryBuilder.build();
...
}
You might be able to get away without registering the type. I think in C++ the proto types are linked into a global registry.
You'll want "ignoringUnknownFields" in case the service adds new fields and exports them and you haven't updated your copy of the proto descriptor. Any "#type" fields in the exported JSON that will cause problems too.
You may need special handling of the payload (i.e. strip if from the JSON and then parse it separately). If it's JSON I'd expect the parser to try populating sub-messages that don't exist. If it's proto ... it actually might work if you register the Any type too.

Java Bean Persistence with XMLEncoder

I've written a bean class containing a HashMultiMap (from the Guava library). I would like to XML encode the bean using the JRE's XMLEncoder. Using a custom PersistenceDelegate I've successfully written the bean to file. However, when I attempt to deserialize the XML I get the exception:
java.lang.NoSuchMethodException: <unbound>=HashMultimap.put("pz1", "pz2")
What am I doing wrong?
// create the bean
SomeBean sb = new SomeBean();
// add some data
HashMultimap<String, String> stateMap = HashMultimap.create();
stateMap.put("pz1", "pz2");
stateMap.put("pz3", "pz4");
sb.setStateMap(stateMap);
// encode as xml
FileOutputStream os = new FileOutputStream("myXMLFile.xml");
XMLEncoder encoder = new XMLEncoder(os);
encoder.setPersistenceDelegate(HashMultimap.class, new CustomPersistenceDelegate());
encoder.writeObject(sb);
// decode the xml
FileInputStream is = new FileInputStream("myXMLFile.xml");
XMLDecoder decoder = new XMLDecoder(is);
Object deSerializedObject = decoder.readObject();
class CustomPersistenceDelegate extends DefaultPersistenceDelegate
{
protected Expression instantiate(Object oldInstance, Encoder out)
{
return new Expression(oldInstance, oldInstance.getClass(), "create", null);
}
protected void initialize(Class<?> type, Object oldInstance, Object newInstance,
Encoder out)
{
super.initialize(type, oldInstance, newInstance, out);
com.google.common.collect.HashMultimap<String, String> m =
(com.google.common.collect.HashMultimap) oldInstance;
for (Map.Entry<String, String> entry : m.entries())
{
out.writeStatement(new Statement(oldInstance, "put",
new Object[] { entry.getKey(), entry.getValue() }));
}
}
}
public class SomeBean
{
private HashMultimap<String, String> stateMap;
public HashMultimap<String, String> getStateMap()
{
return stateMap;
}
public void setStateMap(HashMultimap<String, String> stateMap)
{
this.stateMap = stateMap;
}
}
I don't have a solution (yet). But here is something which at least clarifies the problem. It seems that some change made in Java 7 build 15 and higher has broken the method look up that your Statement requires. If you add an ExceptionListener to the XmlEncoder, it gives you a better idea of how this is failing:
encoder.setExceptionListener(new ExceptionListener() {
#Override
public void exceptionThrown(Exception e) {
System.out.println("got exception. e=" + e);
e.printStackTrace();
}
});
You will see a full stacktrace then:
java.lang.Exception: Encoder: discarding statement HashMultimap.put(Object, Object);
at java.beans.Encoder.writeStatement(Encoder.java:306)
at java.beans.XMLEncoder.writeStatement(XMLEncoder.java:400)
at test2.XmlEncoderTest$CustomPersistenceDelegate.initialize(XmlEncoderTest.java:83)
at java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:118)
at java.beans.Encoder.writeObject(Encoder.java:74)
at java.beans.XMLEncoder.writeObject(XMLEncoder.java:327)
at java.beans.Encoder.writeExpression(Encoder.java:330)
at java.beans.XMLEncoder.writeExpression(XMLEncoder.java:454)
at java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:115)
at java.beans.Encoder.writeObject(Encoder.java:74)
at java.beans.XMLEncoder.writeObject(XMLEncoder.java:327)
at java.beans.Encoder.writeExpression(Encoder.java:330)
at java.beans.XMLEncoder.writeExpression(XMLEncoder.java:454)
at java.beans.DefaultPersistenceDelegate.doProperty(DefaultPersistenceDelegate.java:194)
at java.beans.DefaultPersistenceDelegate.initBean(DefaultPersistenceDelegate.java:253)
at java.beans.DefaultPersistenceDelegate.initialize(DefaultPersistenceDelegate.java:400)
at java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:118)
at java.beans.Encoder.writeObject(Encoder.java:74)
at java.beans.XMLEncoder.writeObject(XMLEncoder.java:327)
at java.beans.Encoder.writeExpression(Encoder.java:330)
at java.beans.XMLEncoder.writeExpression(XMLEncoder.java:454)
at java.beans.PersistenceDelegate.writeObject(PersistenceDelegate.java:115)
at java.beans.Encoder.writeObject(Encoder.java:74)
at java.beans.XMLEncoder.writeObject(XMLEncoder.java:327)
at java.beans.Encoder.writeObject1(Encoder.java:258)
at java.beans.Encoder.cloneStatement(Encoder.java:271)
at java.beans.Encoder.writeStatement(Encoder.java:301)
at java.beans.XMLEncoder.writeStatement(XMLEncoder.java:400)
at java.beans.XMLEncoder.writeObject(XMLEncoder.java:330)
...
Caused by: java.lang.NoSuchMethodException: HashMultimap.put(Object, Object);
at java.beans.Statement.invokeInternal(Statement.java:313)
at java.beans.Statement.access$000(Statement.java:58)
at java.beans.Statement$2.run(Statement.java:185)
at java.security.AccessController.doPrivileged(Native Method)
at java.beans.Statement.invoke(Statement.java:182)
at java.beans.Statement.execute(Statement.java:173)
at java.beans.Encoder.writeStatement(Encoder.java:304)
... 51 more
The Caused by section shows that it failed to locate the put method. It looks to me like this happens because it can't match the method signature properly any more. It fails in the java beans MethodFinder, but since the source code is not included in JDK, I couldn't track it down well enough.
If I can find exact cause, I will update this. Just wanted to provide you with more information in the meantime.
UPDATE
I think it's a bug in these later versions. Here is a unit test which exposes the bug (or unexpected behavior) more directly. The failure below is exactly what is happening in your code:
#Test
public void testMethodFinder() throws Exception {
Method m0 = MethodFinder.findMethod(this.getClass(), "setUp", new Class<?>[0]);
assertNotNull(m0);
// this is okay, because method is declared in the type referenced
Method m = MethodFinder.findMethod(Multimap.class, "put", new Class<?>[] { Object.class, Object.class });
assertNotNull(m);
try {
// this fails, apparently because method is not declared in this subclass (is inherited from parent class)
Method m2 = MethodFinder.findMethod(HashMultimap.class, "put", new Class<?>[] { Object.class, Object.class });
assertNotNull(m2);
} catch (Exception e) {
System.out.println("got exception. e=" + e);
}
}

Categories