How to access Kafka headers while consuming a message? - java

Below are my configuration
<int-kafka:inbound-channel-adapter id="kafkaInboundChannelAdapter"
kafka-consumer-context-ref="consumerContext"
auto-startup="true"
channel="inputFromKafka">
<int:poller fixed-delay="1" time-unit="MILLISECONDS" />
</int-kafka:inbound-channel-adapter>
inputFromKafka goes through transformation below
public Message<?> transform(final Message<?> message) {
System.out.println( "KAFKA Message Headers " + message.getHeaders());
final Map<String, Map<Integer, List<Object>>> origData = (Map<String, Map<Integer, List<Object>>>) message.getPayload();
// some code to figure-out the nonPartitionedData
return MessageBuilder.withPayload(nonPartitionedData).build();
}
The print statement from above prints only two consistent headers regardless
KAFKA Message Headers {id=9c8f09e6-4b28-5aa1-c74c-ebfa53c01ae4, timestamp=1437066957272}
While Sending a Kafka message some headers were passed including KafkaHeaders.MESSAGE_KEY but I am not getting back that either, wondering if there is away to accomplish this?

Unfortunately it doesn't work that way...
The Producer part (KafkaProducerMessageHandler) looks like this:
this.kafkaProducerContext.send(topic, partitionId, messageKey, message.getPayload());
As you see we don't send any messageHeaders to the Kafka topic. Only payload and exactly under that messageKey as it specified by Kafka protocol.
From other side the Consumer side (KafkaHighLevelConsumerMessageSource) does this logic:
if (!payloadMap.containsKey(messageAndMetadata.partition())) {
final List<Object> payload = new ArrayList<Object>();
payload.add(messageAndMetadata.message());
payloadMap.put(messageAndMetadata.partition(), payload);
}
As you see we don't care here about messageKey.
The KafkaMessageDrivenChannelAdapter (<int-kafka:message-driven-channel-adapter>) is for you! It does this before sending the message to the channel:
KafkaMessageHeaders kafkaMessageHeaders = new KafkaMessageHeaders(generateMessageId, generateTimestamp);
Map<String, Object> rawHeaders = kafkaMessageHeaders.getRawHeaders();
rawHeaders.put(KafkaHeaders.MESSAGE_KEY, key);
rawHeaders.put(KafkaHeaders.TOPIC, metadata.getPartition().getTopic());
rawHeaders.put(KafkaHeaders.PARTITION_ID, metadata.getPartition().getId());
rawHeaders.put(KafkaHeaders.OFFSET, metadata.getOffset());
rawHeaders.put(KafkaHeaders.NEXT_OFFSET, metadata.getNextOffset());
if (!this.autoCommitOffset) {
rawHeaders.put(KafkaHeaders.ACKNOWLEDGMENT, acknowledgment);
}

As stated before there's no concept of message headers in Kafka. Because I struggled with that same problem in the past I've compiled a small library that helps tackle this issue. It may come handy to you.

Related

Write camel routes running after last processor ends

I have three component: rest, Cassandra and Kafka and I am using Apache camel. When the request receives, I want to add a record in Cassandra and after that, adding that record to Kafka. Finally generating rest response. May be pipeline is not a good solution for me! Because Cassandra part is InOnly and hasn't any out exchange!
I wrote this route:
rest().path("/addData")
.consumes("text/plain")
.produces("application/json")
.post()
.to("direct:requestData");
from("direct:requestData")
.pipeline("direct:init",
"cql://localhost/myTable?cql=" + CQL,
"direct:addToKafka"
)
.process(exchange -> {
var currentBody = (List<?>) exchange.getIn().getBody();
var body = new Data((String) currentBody.get(0), (Long) currentBody.get(1), (String) currentBody.get(2));
exchange.getIn.setBody(body.toJsonString());
});
from("direct:init")
.process(exchange -> {
var currentBody = exchange.getIn().getBody();
var body = Arrays.asList(generateId(), generateTimeStamp, currentBody);
exchange.getIn().setBody(body);
});
from("direct:addToKafka")
.process(
// do sth to add kafka
);
I tried sth such setting patternExtention to InOut for Cassandra!!! finally understand that this is impossible! because patternExtention used for consumer and I used Cassandra in a route as producer.

How to create pubsub topic with Customer-managed encryption key in JAVA

I am trying to create Pub/Sub topic with customer-managed encryption keys in Java.
In Python we can create a topic using CMEK location as parameter as below:
topic = client.create_topic(
topic_path,
kms_key_name=cmek_location,
message_storage_policy=get_allowed_region()
)
In java I am using the following:
TopicAdminClient topicAdminClient = TopicAdminClient.create(topicAdminSettings);
topicAdminClient.createTopic(topic);
How can we use the CMEK location in java code?
For that purpose you can use the following code, extracted from the createTopic method documentation:
try (TopicAdminClient topicAdminClient = TopicAdminClient.create()) {
Topic request =
Topic.newBuilder()
.setName(TopicName.ofProjectTopicName("[PROJECT]", "[TOPIC]").toString())
.putAllLabels(new HashMap<String, String>())
.setMessageStoragePolicy(MessageStoragePolicy.newBuilder().build())
.setKmsKeyName("kmsKeyName412586233")
.setSchemaSettings(SchemaSettings.newBuilder().build())
.setSatisfiesPzs(true)
.setMessageRetentionDuration(Duration.newBuilder().build())
.build();
Topic response = topicAdminClient.createTopic(request);
}
Basically you provide a template of the Topic you want to create.
In your use case I suppose it will look like similar to this:
try (TopicAdminClient topicAdminClient = TopicAdminClient.create()) {
Topic request =
Topic.newBuilder()
.setName(TopicName.ofProjectTopicName("[PROJECT]", "[TOPIC]").toString())
.setKmsKeyName("kmsKeyName412586233") //cmek location
.setMessageStoragePolicy(
MessageStoragePolicy.newBuilder()
.addAllowedPersistenceRegions("us-central1") // get_allowed_region
.build()
)
.build();
Topic response = topicAdminClient.createTopic(request);
}
Please, pay attention to the setKmsKeyName method.
The API is described in this GCP documentation.

Kafka Protobuf: C++ serialization to java

I've developed a couple of C++ apps that produce and consume Kafka messages (using cppkafka) embedding Protobuf3 messages. Both work fine. The producer's relevant code is:
std::string kafkaString;
cppkafka::MessageBuilder *builder;
...
solidList->SerializeToString(&kafkaString);
builder->payload(kafkaString);
Protobuf objects are serialized to string and inserted as Kafka payload. Everything works fine up to this point. Now, I'm trying to develop a consumer for that in Java. The relevant code should be:
KafkaConsumer<Long, String> consumer=new KafkaConsumer<Long, String>(properties);
....
ConsumerRecords<Long, String> records = consumer.poll(100);
for (ConsumerRecord<Long, String> record : records) {
SolidList solidList = SolidList.parseFrom(record.value());
...
but that fails at compile time: parseFrom complains: The method parseFrom(ByteBuffer) in the type Solidlist.SolidList is not applicable for the arguments (String). So, I try using a ByteBuffer:
KafkaConsumer<Long, ByteBuffer> consumer=new KafkaConsumer<Long, ByteBuffer>(properties);
....
ConsumerRecords<Long, ByteBuffer> records = consumer.poll(100);
for (ConsumerRecord<Long, ByteBuffer> record : records) {
SolidList solidList = SolidList.parseFrom(record.value());
...
Now, the error is on execution time, still on parseFrom(): Exception in thread "main" java.lang.ClassCastException: java.lang.String cannot be cast to java.nio.ByteBuffer. I know it is a java.lang.String!!! So, I get back to the original, and try using it as a byte array:
SolidList solidList = SolidList.parseFrom(record.value().getBytes());
Now, the error is on execution time: Exception in thread "main" com.google.protobuf.InvalidProtocolBufferException$InvalidWireTypeException: Protocol message tag had invalid wire type..
The protobuf documentation states for the C++ serialization: bool SerializeToString(string output) const;: serializes the message and stores the bytes in the given string. Note that the bytes are binary, not text; we only use the string class as a convenient container.*
TL;DR: In consequence, how should I interpret the protobuf C++ "binary bytes" in Java?
This seems related (it is the opposite) but doesn't help: Protobuf Java To C++ Serialization [Binary]
Thanks in advance.
Try implementing a Deserializer and pass it to KafkaConsumer constructor as value deserializer. It could look like this:
class SolidListDeserializer implements Deserializer<SolidList> {
public SolidList deserialize(final String topic, byte[] data) {
return SolidList.parseFrom(data);
}
...
}
...
KafkaConsumer<Long, SolidList> consumer = new KafkaConsumer<>(props, new LongDeserializer(), new SolidListDeserializer())
You can read kafka as ConsumerRecords<Long, String>. And then SolidList.parseFrom(ByteBuffer.wrap(record.value().getBytes("UTF-8")));

Null value in spark streaming from Kafka

I have a simple program because I'm trying to receive data using kafka. When I start a kafka producer and I send data, for example: "Hello", I get this when I print the message: (null, Hello). And I don't know why this null appears. Is there any way to avoid this null? I think it's due to Tuple2<String, String>, the first parameter, but I only want to print the second parameter. And another thing, when I print that using System.out.println("inside map "+ message); it does not appear any message, does someone know why? Thanks.
public static void main(String[] args){
SparkConf sparkConf = new SparkConf().setAppName("org.kakfa.spark.ConsumerData").setMaster("local[4]");
// Substitute 127.0.0.1 with the actual address of your Spark Master (or use "local" to run in local mode
sparkConf.set("spark.cassandra.connection.host", "127.0.0.1");
// Create the context with 2 seconds batch size
JavaStreamingContext jssc = new JavaStreamingContext(sparkConf, new Duration(2000));
Map<String, Integer> topicMap = new HashMap<>();
String[] topics = KafkaProperties.TOPIC.split(",");
for (String topic: topics) {
topicMap.put(topic, KafkaProperties.NUM_THREADS);
}
/* connection to cassandra */
CassandraConnector connector = CassandraConnector.apply(sparkConf);
System.out.println("+++++++++++ cassandra connector created ++++++++++++++++++++++++++++");
/* Receive kafka inputs */
JavaPairReceiverInputDStream<String, String> messages =
KafkaUtils.createStream(jssc, KafkaProperties.ZOOKEEPER, KafkaProperties.GROUP_CONSUMER, topicMap);
System.out.println("+++++++++++++ streaming-kafka connection done +++++++++++++++++++++++++++");
JavaDStream<String> lines = messages.map(
new Function<Tuple2<String, String>, String>() {
public String call(Tuple2<String, String> message) {
System.out.println("inside map "+ message);
return message._2();
}
}
);
messages.print();
jssc.start();
jssc.awaitTermination();
}
Q1) Null values:
Messages in Kafka are Keyed, that means they all have a (Key, Value) structure.
When you see (null, Hello) is because the producer published a (null,"Hello") value in a topic.
If you want to omit the key in your process, map the original Dtream to remove the key: kafkaDStream.map( new Function<String,String>() {...})
Q2) System.out.println("inside map "+ message); does not print. A couple of classical reasons:
Transformations are applied in the executors, so when running in a cluster, that output will appear in the executors and not on the master.
Operations are lazy and DStreams need to be materialized for operations to be applied.
In this specific case, the JavaDStream<String> lines is never materialized i.e. not used for an output operation. Therefore the map is never executed.

Kafka: Java client that blocks read and doesn't poll

I was wondering if there is a java client code that is a Kafka Consumer that enables to read data via push notification / a blocking read, instead of the current poll:
final KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(props);
consumer.subscribe(Arrays.asList("test"));
new Thread(){
#Override
public void run()
{
while (true)
{
ConsumerRecords<String, String> records = consumer.poll(100); //poll
for (ConsumerRecord<String, String> record : records)
{
System.out.printf("offset = %d, key = %s, value = %s", record.offset(), record.key(),
record.value());
System.out.println();
callback.onMessage(record.value());
}
}
}
}.start();
if I understand your question correctly you wish for the data to be pushed to consumer when available instead of having the consumer being responsible of checking from new data and pulling.
On https://kafka.apache.org/08/design.html they discuss push vs. pull and the choice that was made in Kafka where the producer push the messages to the broker and the consumer pulls from broker. They also mention the attempts they have made to prevent the downsides of a pull-based approach. If you require a pushing pub/sub messaging system you may want to look at Scribe or Flume which is also mentioned in the link :)

Categories